khellang / Middleware

Various ASP.NET Core middleware
MIT License
806 stars 110 forks source link

AddProblemDetailsConventions changes HTTP 400 to 422 #175

Closed kjkrum closed 2 years ago

kjkrum commented 2 years ago

Without AddProblemDetailsConventions, a request with an invalid model returns HTTP response code 400. With AddProblemDetailsConventions, the same request returns 422.

Possibly related to #125.

khellang commented 2 years ago

See https://github.com/khellang/Middleware/issues/149#issuecomment-1022156162.

You can set ValidationProblemStatusCode back to 400 of you want:

https://github.com/khellang/Middleware/blob/a81989f16c83c286d1c2fee2e956b27b3243663b/src/ProblemDetails/ProblemDetailsOptions.cs#L133-L136

kjkrum commented 2 years ago

@khellang Interesting discussion in that thread, particularly your last comment about it being impossible to distinguish syntactic and semantic errors in MVC. I wonder if the MVC team thinks about this the way I do. That is, a missing required property or something like that is really a syntax error, not a semantic error. Although it's not a syntax error in JSON, it is a syntax error in the DSL of the model. I think of semantic errors as errors that in principle cannot be detected during model binding without reference to the overall application state (e.g, the database).

khellang commented 2 years ago

That is, a missing required property or something like that is really a syntax error, not a semantic error. Although it's not a syntax error in JSON, it is a syntax error in the DSL of the model.

That's not really syntax, but semantics, though. The reason the ASP.NET team didn't want to use 422 is because it was originally a WebDAV status code and not part of a HTTP standard. This has since changed, but it's too late to change it in MVC now. See https://github.com/dotnet/aspnetcore/issues/6145 for a broader discussion.

kjkrum commented 2 years ago

Well, I'm just glad it can be changed. I will always consider 400 the more correct status code for model binding errors. They are syntax errors in a language one layer above the XML or JSON, but syntax errors nonetheless.

khellang commented 2 years ago

They are syntax errors in a language one layer above the XML or JSON, but syntax errors nonetheless.

I think you need to look up the definition of syntax πŸ˜…

kjkrum commented 2 years ago

Anything that can be validated by a grammar is syntax. The presence or absence of a property is semantic in JSON, but syntactic in the strongly typed DSL that is expressed in JSON. A missing property is structurally incorrect in the DSL.

This is also related to the debate about whether HTTP response codes should strictly refer to the HTTP protocol itself, or whether they can express application-layer information. For example, is it appropriate to return a 404 if a parameter refers to an entity that does not exist? Or should 404 be strictly reserved for invalid routes? OData is firmly in the first camp. The existence of 409 supports this. But the idea that failed model binding should be a 422 seems to be in the second camp. If you follow that line of thinking to its logical conclusion, then every error above the protocol layer becomes a 422, and the API loses a lot of expressiveness unless you tediously redefine errors like "not found" in the application layer.

Another way to look at it is that 400 means the payload is syntactically invalid in the language identified by the content type header. If the content type is application/json, then errors in JSON syntax should be 400 and errors in DSL syntax should be 422. But if the content type includes a media type, then 400 would be appropriate for syntax errors in JSON or the DSL of the media type.

I don't mean to come off as hostile or critical. This library is awesome and your responsiveness is refreshing. But think very carefully about this.

khellang commented 2 years ago

But we agree that a payload has both syntactic and semantic meaning beyond the protocol? Why shouldn't a status code exist that conveys that meaning? FWIW, I think it's nice to be able to differentiate between the two error states. And my (and other framework authors) understanding of the spec is that's what 422 is for πŸ˜…

kjkrum commented 2 years ago

Yes, the payload has both syntactic and semantic meaning. If I understand correctly, you're interested in differentiating the second and third rows of this table, but this is not currently achievable because of the way MVC handles a particular exception that you mentioned in the other thread you linked.

I'm interested in differentiating the third and fourth rows, as MVC currently does, because the first three rows are technical errors (i.e., bugs in the client) while the fourth row is a business logic error. That's the interesting distinction in terms of how and by whom the error is handled.

MVC default your ideal library default
protocol error 400 400 422
payload XML/JSON syntax error (model binding) 400 400 422
payload DSL syntax error (model binding) 400 422 422
payload semantic error (controller) 422* 422* 422*
khellang commented 2 years ago

MVC will never produce 422 though. That's the entire problem. MVC doesn't differentiate syntax errors from semantic errors. Everything is treated as validation errors, which is semantics. I have no clue what "DSL syntax error" means. To me, row 3 and 4 are the same; It's either a syntax error - "I can't even read/understand this" or it's a semantic error - "I can read this, but I don't know what to do with it or its instructions are invalid". The former is 400, the latter is 422, just like the spec says.

khellang commented 2 years ago

The original WebDAV spec for 422 even mentions this;

The 422 (Unprocessable Entity) status code means the server understands the content type of the request entity (hence a 415 (Unsupported Media Type) status code is inappropriate), and the syntax of the request entity is correct (thus a 400 (Bad Request) status code is inappropriate) but was unable to process the contained instructions. For example, this error condition may occur if an XML request body contains well-formed (i.e., syntactically correct), but semantically erroneous, XML instructions.

kjkrum commented 2 years ago

The schema that the API accepts is a language with a syntax. Errors like mismatched brackets are syntax errors in the JSON. Errors such as missing required properties are semantic errors in JSON but syntactic errors in the DSL. You disagree that this is syntax and call it semantics, but that's actually beside the point. There is no benefit in distinguishing these errors from JSON syntax errors.

The critical thing is that any error in model binding, regardless of whether you call it syntactic or semantic, is a technical error in the client, whereas a 422 returned by a controller represents a user input error. This is the useful distinction to make. The way MVC behaves is ideal. Returning 422 for errors in model binding creates ambiguity between technical errors and user errors.

khellang commented 2 years ago

You need to read about 422 in either RFC4918 and/or RFC9110 if you think the distinction is about technical vs user error πŸ˜† And by the sound of it, I think you have misunderstood what model binding in MVC does. User input is handled by MVC validation, not model binding πŸ˜… Anyway, this isn't going anywhere. You can return whatever you want in your APIs 😁 Have a nice week! πŸ‘‹

kjkrum commented 2 years ago

I brought this up because I've read those RFCs. The word "syntax" in RFC4918 could refer to three things: the syntax of the HTTP request; the JSON or XML syntax of the payload part of the request; or the DSL syntax of the payload. My argument is that it refers to all three. If I understand your position correctly, you think it only refers to the first two because you don't acknowledge that the third is even a thing. Per the RFC, the appropriate response to a syntax error is 400, not 422. The distinction between technical and user error lends pragmatic weight to the interpretation that "syntax" in the RFC includes the DSL syntax.

khellang commented 2 years ago

because you don't acknowledge that the third is even a thing

Exactly. A missing property isn't a JSON syntax error, but a semantic one. The serializer will happily deserialize the payload and you (or MVC) have to validate its instructions. If that fails, it's a 422 and either the developer or (assuming there's nothing wrong with the code) the user on the client-side needs to fix it ☺️

kjkrum commented 2 years ago

There's a domain-specific language layered on top of the JSON. The syntax of that language is expressed in the semantic content of the JSON. Model binding and validation are responsible for validating that syntax. Model binding and validation are in absolutely no way concerned with the semantic content of that language. From the perspective of the controller and service layers, as well as the client, any error that occurs during model binding and validation is a syntax error. This is the sense in which "syntax" should be understood in RFC4918. Logically, this inevitably follows from recognizing that the schema is a language. Pragmatically, it creates an elegant separation of concerns that is neatly reflected in the response codes.

Normalizing returning 422 for errors in model binding and validation will harm the Internet.

khellang commented 2 years ago

Haha, you got me. I'll stop feeding the troll now πŸ˜†

kjkrum commented 2 years ago

Not trolling. I think this is important. And I think that anyone who recognizes that the schema is a language will find my viewpoint compelling.