Discussion: Generic error codes in the spec

pranshuchittora commented 4 years ago

Abstract I am researching and planning to build a generic test suite to check the compliance of the various GraphQL libraries The project idea is given by the GraphQL Foundation for the Google Summer of Code 2020. There are various implementations of GraphQL and all of them are throwing error in various ways. As all errors are thrown in plain text(string). There is no way to test these implementations for such errors.

Say we have given an incorrect query (with circular fragment) to a GraphQL server and we are expecting an error. If the error comes with a specific error code then it would be great to test the libraries and provides and assuring the actual cause of the error, therefore, improving the DX as well.

For example imagine a well known error place that can hold a "spec rule number" that is indicative of what validation rule you have caused an error message to fail on

errors : [ { message : "any old message", extensions : [ __graphqlspec : { reason : "March2018-5.1.2.4" }}}}

Follow this discussion -> https://github.com/graphql-java/graphql-java/issues/1826#issuecomment-601550014

Motivation I am really impressed by the standardisation of the error codes in the following projects.

React error codes -> https://github.com/facebook/react/blob/master/scripts/error-codes/codes.json
Rust lang -> https://doc.rust-lang.org/error-index.html
TypeScript -> https://github.com/microsoft/TypeScript/blob/master/src/compiler/diagnosticMessages.json

What can be done It would be great if we can come up with such standard error codes in the GraphQL spec.

sungam3r commented 4 years ago

OK. My thoughts:

No breaking changes. Good.
Easy to implement.
It should be explicitly stated in spec (if we decide to supplement the spec) that attaching the error information about code is optional for the server.
I propose to use __graphqlspec as a conventional prefix for all such things and not as full property name. Full property name for error code may be __graphqlspecErrorCode

shobhitchittora commented 4 years ago

Well @sungam3r IMO the whole point of adding such error codes is to enforce / streamline error codes across implementations. Having it optional would defeat the whole purpose. What are your thoughts on this. What effort do you think would go in this route?

sungam3r commented 4 years ago

That is simple. It is impossible to do such error codes as required properties since

Breaking changes to existing implementations.
Extensions property by design has no structure defined.

sungam3r commented 4 years ago

See https://github.com/graphql/graphql-spec/blob/master/CONTRIBUTING.md#guiding-principles

Future changes should not change the meaning of existing schema or queries or in any other way cause an existing compliant GraphQL service to become non-compliant for prior versions of the spec.

markokr commented 4 years ago

How about a optional "catagory" key, so clients can make broad decision and servers are testable whether right check triggered?

Enumerating individual errors seems lot of work, for dubious benefit.

Main disadvantage would be that some implementations opt out from it if it's too bothersome.

sungam3r commented 4 years ago

Well, category or code are the same things. GraphQL-dotnet for example already has error codes in response https://github.com/graphql-dotnet/graphql-dotnet/blob/master/src/GraphQL.NewtonsoftJson/ExecutionResultJsonConverter.cs#L98

markokr commented 4 years ago

code is fine as a key. The question is can other implementations reproduce them.

Main goal is spec regression tests that can be run over several implementations.

It's fine to say that this testsuite requires support for code.

pranshuchittora commented 4 years ago

Well, category or code are the same things. GraphQL-dotnet for example already has error codes in response https://github.com/graphql-dotnet/graphql-dotnet/blob/master/src/GraphQL.NewtonsoftJson/ExecutionResultJsonConverter.cs#L98

That looks interesting, can you point me to the exact listing of these error codes.

michaelstaib commented 4 years ago

Code is already used in many implementations so I would go with that. We also use code in Hot Chocolate. The more important issue is to specify error codes for spec errors.

sungam3r commented 4 years ago

That looks interesting, can you point me to the exact listing of these error codes.

Errors codes are aligned with paragraph numbers from the spec. 5.4.2.1 for http://spec.graphql.org/June2018/#sec-Required-Arguments for example.

markokr commented 4 years ago

Hardwiring chapter numbers as errors seems wrong, both implementations side and also spec side: one-chapter-per-error may be annoying to maintain, especially if you are not allowed to refactor.

Named errors seems good enough, or if you really want numbers, then OID-like tree maintained in spec.

michaelstaib commented 4 years ago

I also would not bind them to the chapters. I personally also prefer named errors. But that might be a personal preference I would also be fine with something else.

dariuszkuc commented 4 years ago

Similar to "error code", we are using system events for all our exception handling but we expose it in error extension block (as per spec).

When following code first approach, schema generation process should not allow generation of invalid schema. Your runtime should already run valid schema that produces correct responses so I am unsure how often you would end up with something non-compliant with the spec. I believe you need this sort of validation when creating the schema but if that is the case then I am unsure why you would need this error information in GraphQL response as the server should not start with invalid schema.

sungam3r commented 4 years ago

@dariuszkuc The talk is about query validation errors as well.

sungam3r commented 4 years ago

@markokr We can use different properties together- code like UNIQUE_ARGUMENT, number like 1.2.3.4 , even helpUrl to spec like http://spec.graphql.org/June2018/#sec-Argument-Names

sungam3r commented 4 years ago

URLs convenient because paragraph numbers can vary from version to version of specification.

sungam3r commented 4 years ago

@pranshuchittora From my comment it follows that separate test suites are required for each version of the specification.

markokr commented 4 years ago

The testsuite itself would indeed be tied to specific spec version, so instead it makes sense for tests to contain spec section numbers and even links to spec.

I still cannot see how making implementation organize code around spec layout is useful - it may be natural to share some error handling code, so why force it track spec layout?

pranshuchittora commented 4 years ago

@sungam3r I think creating a separate section for the error spec can be one of the approaches. The section will contain a whole bunch of error code like GQ0001 (inspired from rust-lang), and these codes can internally reference their corresponding section i.e. 1.2.3.4.

This approach can be fruitful in the long run as it is decoupled with the entire structure of the spec.

michaelstaib commented 4 years ago

@pranshuchittora this sounds quite good.

michaelstaib commented 4 years ago

@markokr @sungam3r

We implement features once they hit draft. Meaning organizing a test suite around a specific spec version can be count-productive. I like the approaches that web browser suites go where you can see what is implemented and what not.

@pranshuchittora when I talked with @IvanGoncharov about building a replacement for GraphQL-CATS we first wanted to focus on parser tests. What is now the general idea.

We had CATS already implemented for Hot Chocolate but it was quite difficult back then. One of the issues were error codes. But there are other problems to solve. Do you already have a broader idea how to go about it?

pranshuchittora commented 4 years ago

@michaelstaib I am in the process of writing the proposal and I would love to get it reviewed by you, once it gets completed.

As of now, we can only test the implementations for basic input-output, hence not covering the edge case. The idea of standardising the error codes is one of the effort for testing the implementations more deeply. We can improve the proposal prioritizing the aspects of the spec which seem most valuable to be covered with centralized tests.

michaelstaib commented 4 years ago

@pranshuchittora excellent... love that this finally gets some traction.

pranshuchittora commented 4 years ago

Hello there, With several iterations in the design, and the help of @michaelstaib. I have created a rough architecture for the compliance test suite.

Design of the App Server and the test runner. (Docker Approach)

Events and lifecycle

For detailed info, you can read my GSoC proposal -> https://docs.google.com/document/d/1D0PD2nwU97aL-hTc_XLL8H5MYNqBzkIBJeDeh_UJ0yo/edit?usp=sharing

shobhitchittora commented 4 years ago

@michaelstaib , @sungam3r The above sounds quite good. Wanted to also check on the suggestions around Error spec for gql - https://github.com/graphql/graphql-spec/issues/698#issuecomment-602242983

@sungam3r I think creating a separate section for the error spec can be one of the approaches. The section will contain a whole bunch of error code like GQ0001 (inspired from rust-lang), and these codes can internally reference their corresponding section i.e. 1.2.3.4.

This approach can be fruitful in the long run as it is decoupled with the entire structure of the spec.

Let me know if we can start some discussion around this, and write an initial spec / RFC around this.

sungam3r commented 4 years ago

I'm doing other things now, I can help with wordings and review.

Fontinalis commented 4 years ago

@shobhitchittora I’m happy to hop in and start some discussion on this. Currently building a GraphQL server in Go and would love to add these generic error codes as early as possible, would make testing much easier later.

@pranshuchittora Awesome work, nice proposal.

michaelstaib commented 4 years ago

@shobhitchittora I would really like to get this going since it would at a lot of value. So, I am in.

shobhitchittora commented 4 years ago

Thanks for the positive vibe here. Let me take some time and start a RFC around this.

Thanks all.

shobhitchittora commented 4 years ago

@michaelstaib @Fontinalis @sungam3r @markokr @pranshuchittora @IvanGoncharov Hope you're all safe and well. Please check out #708. Would love all suggestions and feedback for improvements.

rivantsov commented 3 years ago

I suggest to narrow down this - provide standard differentiator between user errors vs server errors. As for server side errors like code failures - these should NEVER be returned to user in prod; they should go to server-side log and server should return generic "Server Error, we are working on this". The UI might chose to show it to user indicating that there's nothing user can do, just wait. Returning server error messages obviously is a clear security risk.

Client errors, like form validation errors, ex: 'value too long', 'the username already taken', 'the doc of this type must provide also XYZ information'. Typical case - user fills out the form, with multiple edit boxes or UI controls, submits. The client submits mutation request, and receives a list of errors. These should be clearly understood by UI/client that they should be shown to user; each error contains Path which identifies the control, so UI can place error message next to invalid value. Yes, in many cases UI can do validation before submitting the request, but not always - like some name uniqueness, or validation against some server data. Standardizing these validation errors with standard codes - maybe, like "ValueTooLong" code. But the list of what can be standardized is very limited, most are form/biz logic specific in the app.

To sum it up: Provide a way to distinguish Server Error vs BadRequest as in HTTP status codes, and that's it. This will allow standardize lots of client code, up to the point of ability to switch from one server implementation to another.
One more thing: there might be 3d category like malformed query, which is not server error, not user error, but client code error.

leoloso commented 2 years ago

I have recently added to my server the feature of returning error codes, and also a specifiedBy entry pointing to the GraphQL spec documentation about the error.

I have a few thoughts about how worthy these features are. On the plus side, I find it benefitial to support generic error codes in the spec, so that servers would have a clearer idea of all validations to perform, and developers can be given better feedback on where the problem happens.

On the not-so-good side, my server could not benefit from the generic test suite (if it's indeed created), since all the validations are already performed via unit tests, which don't require the full environment to be set-up (not even the DB), so it's way simpler.

I wrote an article about my experience (including a few other insights I got from my design and implementation). I hope it can be useful to revive this issue 😀

rivantsov commented 2 years ago

Response to @leoloso

We should state the main goal: make it easier for the client (code) to figure out a proper action on receiving the error. Is it fixable by user? - so we can show error message next to the control with missing value in UI? or is it a fatal server error, no use to retry? Currently detecting client action by error message is not possible: the message is a full sentence, and it is LOCALIZABLE (can be in language selected by user in multi-language apps)

Comments on the article on Medium:

By supporting generic error codes, we can possibly centralize the definitions for all error messages and codes in the application

Kind of bad example with React. In general, error messages are localizable - to possibly present in UI; only error codes are English-only, to be handled by code. React error codes (in json file) are clearly for developers (!) only; they would never be shown to end user.

On using the ref to GraphQL spec section as error code:

it is not descriptive, it requires lookup the spec (same goes for plain numbers)
section numbers can change (for the same validation rule) in future versions of spec - that would cause problems better go with plain codes as "ValueMayNotBeNull"

Classification of errors: There is no real difference between 1 and 2 (spec errors and client errors) - they both result in error message that needs to be shown to user Server error - bad example with external connection failure. We need one more classification distinction: -- Server error, permanent - a bug, process crush. Repeating the same request later will not help -- Server error, transient - server is OK, but external service disconnected; client can try the request later, maybe in a few seconds

Expected errors vs 'known in advance' - bad distinction; many server errors can be expected and mitigated, like failure to connect to external unreliable service, server might return a nice 'sorry-try again later' message

Note: error codes are useful even if exact codes are NOT standardized. What needs to be standardized is 'Error Category' (extra field):

bad reqwuest (client error, missing/invalid value);
server error, transient (retry later, no problem with request)
server error, permanent (server code bug most likely, server process crush, "Unexpected server error, admins notified, we are working on this"
(maybe) malformed request - like bad request, but not fixable by user, client code error, sending garbage to the server. Should result in generic message to user "sorry we failed. we logged error and working on it"

Putting GraphQL spec errors into a separate category from other client errors - disagree, there's nothing special about them. For an app server, they are all the same, bad request. An error like 'missing value for required input' is a violation against its internal logic, will be caught in biz logic. Server programmer can decide to handle all these errors (spec or not) in biz logic layer and disable GraphQL layer validation.

Retrieving all error codes via introspection - thumbs up.

Format of the error code, namespacing (?!) codes - wow, not sure about that. Just free-form string, preferably without spaces: "ValueMissing". Namespacing - the danger of error codes colliding is overrated. And we even don't have namespaces for type system!

Namespaces like "module1\1", to reflect originating module - terrible idea, client should NOT know or care about modular structure of the server side, and it should not be exposed.

Categories (Warning, Suggestion etc) - not sure it's useful. As an extra field in error in Extensions, in addition to error message, Hint: Code: "BadRequest" Message: "Account number is missing" Extensions/Hint: "You can find account number at the top of your last statement."

An example with '@export' and to return warning instead of message - really bad example. This case is for developer 'playing' with API in Graphiql. It's not worth discussing tuning up errors (error or warning) for human devs. We should focus on standard scenario - real UI app in browser (not Graphiql) communicating with the server, and helping client-side CODE to handle it. Warning - what is that? Error is clear - operation failed. with warning - did we fail or not?

If error codes were to be returned, the GraphQL spec errors should be mandatory (or they’d provide no value) why?!!!

Generic test suites - agree, not feasible and I think useless anyway.

SpecifiedBy field - I do not think it's that useful. Unneeded complexity and headache in implementation, and useless for end user.

To sum it up:

Add 2 fields: Code and Category (group? Kind?) Category: BadRequest, MalformedRequest, ServerError, ServerFailureTransient Code: short free-form string, explanatory about error

graphql / graphql-spec

Discussion: Generic error codes in the spec #698

Design of the App Server and the test runner. (Docker Approach)

Events and lifecycle