meltano / sdk

Write 70% less code by using the SDK to build custom extractors and loaders that adhere to the Singer standard: https://sdk.meltano.com
https://sdk.meltano.com
Apache License 2.0
100 stars 70 forks source link

[Feature]: Handle GraphQL errors #1421

Open edgarrmondragon opened 1 year ago

edgarrmondragon commented 1 year ago

Feature scope

Taps (catalog, state, stream maps, etc.)

Description

As documented in https://www.apollographql.com/docs/react/data/error-handling, GraphQL errors are included as an array in the response:

{
  "errors": [
    {
      "message": "Cannot query field \"nonexistentField\" on type \"Query\".",
      "locations": [
        {
          "line": 2,
          "column": 3
        }
      ],
      "extensions": {
        "code": "GRAPHQL_VALIDATION_FAILED",
        "exception": {
          "stacktrace": [
            "GraphQLError: Cannot query field \"nonexistentField\" on type \"Query\".",
            "...additional lines..."
          ]
        }
      }
    }
  ],
  "data": null
}

(getting JSON-RPC flashbacks 💀)

It'd be good to handle these as part of the GraphQL.valildate_response implementation.

See https://github.com/MeltanoLabs/tap-github/pull/181

laurentS commented 1 year ago

To replicate what I wrote in the issue linked above, graphql seems to often (because the spec isn't clear about it) return HTTP 200 status codes even if an error is found. So the checks in the RESTClient aren't enough to catch errors. As an example, I found the issue with tap-github where the API was responding with "errors": "Your token has insufficient scope for this request" but this was ignored by the tap, leading to a hard to track problem, with no error and no data.

The spec clarified that a key errors should be present at the root of the JSON response (next to data if the latter is present), and each item in the array of errors should at least contain a message key. The fact that both keys data and errors can be present in parallel tells me the sdk should probably proceed with the normal handling of data even if an error is detected, and maybe raise a fatal error if there is an error but no data.

What do you think?

Obligatory meme: image

edgarrmondragon commented 1 year ago

To replicate what I wrote in the issue linked above, graphql seems to often (because the spec isn't clear about it) return HTTP 200 status codes even if an error is found.

@laurentS Even some allegedly REST APIs do that 🤦 and it was part of the motivation for adding RESTStream.validate_response (even had to add the same meme to the docs 😅)

The spec clarified that a key errors should be present at the root of the JSON response (next to data if the latter is present), and each item in the array of errors should at least contain a message key. The fact that both keys data and errors can be present in parallel tells me the sdk should probably proceed with the normal handling of data even if an error is detected, and maybe raise a fatal error if there is an error but no data.

Yeah, that seems relatively easy to implement in a new GraphQLStream.validate_response. It should still probably call super().validate_response to catch HTTP status errors.

I added the https://github.com/meltano/sdk/labels/Accepting%20Pull%20Requests in case you or someone else is interested in contributing.

stale[bot] commented 1 year ago

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

edgarrmondragon commented 1 year ago

Still relevant

stale[bot] commented 4 months ago

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.