quarkiverse / quarkus-resteasy-problem

Unified error responses for Quarkus REST APIs via Problem Details for HTTP APIs (RFC9457 & RFC7807)
https://docs.quarkiverse.io/quarkus-resteasy-problem/dev
Apache License 2.0
69 stars 12 forks source link

fix NullPointerException that would occur when WebApplicationException was thrown with an unassigned Http Status Code #443

Closed JacobOJ closed 1 week ago

JacobOJ commented 1 week ago

At my company, some endpoints return non-standard HTTP Status Codes.

These cause WebApplicationExceptionMapper to throw a NullPointerException because Response.StatusType status = Response.Status.fromStatusCode(statusCode); returns null and then status.getReasonPhrase()) throws the exception. This ultimate results in the error not being mapped to a HttpProblem.

This PR fixes that by setting the title of the problem to Unknown HTTP Status Code instead, if the above fails.

pazkooda commented 1 week ago

If you're using custom HTTP status codes why won't you write custom mappers for them 🤔 ? Those are designed for such cases.

I don't see a reason putting logic in generic mapper that handle non-generic scenarios. Or do I miss something?

JacobOJ commented 1 week ago

If you're using custom HTTP status codes why won't you write custom mappers for them 🤔 ? Those are designed for such cases.

I don't see a reason putting logic in generic mapper that handle non-generic scenarios. Or do I miss something?

That is certainly an option.

The reason for this PR is because the nullpointer that happened in WebApplicationExceptionMapper would bypass the entire problem response mapping. It is my understanding that all responses should be mapped to a problem, when using this library.

If the concensus is for handling such cases in custom mappers, just let me know, and I will close this PR.

pazkooda commented 1 week ago

The reason for this PR is because the nullpointer that happened in WebApplicationExceptionMapper would bypass the entire problem response mapping. It is my understanding that all responses should be mapped to a problem, when using this library.

@lwitkowski: This part is very early code and if I correctly remember we consciously made a decison to keep WebAppExc clean of custom HTTP status code scenarios as it is intended for standard Web Error Codes as in e.g. MDN . Correct me if I'm wrong.

If the concensus is for handling such cases in custom mappers, just let me know, and I will close this PR.

If it will be me I'd move everything custom to custom mappers.

lwitkowski commented 1 week ago

Thanks for contribution @JacobOJ !

@lwitkowski: This part is very early code and if I correctly remember we consciously made a decison to keep WebAppExc clean of custom HTTP status code scenarios as it is intended for standard Web Error Codes as in e.g. MDN . Correct me if I'm wrong.

We didn't want to introduce custom ENUM for custom codes, that's correct. And we're talking about generic WebApplicationException for which existing mapper works nice for standard codes, so custom mapper would have to duplicate its logic (or delegate the call to existing impl).

I personally don't think our extension should throw NPE in any case, it should always end up in a Problem response, with internal server error if we have no idea what else (more specific) could be done. If WebAppException accepts any integer code, maybe we can pass it 'as provided', without introducing custom enums, so I'd vote to merge this as proposed by @JacobOJ, or with slightly different default title:

"HTTP Status Code: " + status instead of "Unknown HTTP Status Code"

pazkooda commented 1 week ago

If we're to make such adjustment I'd like to understand from where this custom HTTP status code comes from. Cause the only scenarios I see are:

Plain HTTP 500 + HTTP Status Code: ${code} will be bit enigmatic in second scenario - personal opinion of me playing role of "consumer of API error" that came from non-standard error code. I'd expect at least some trace ([WARN]?) in logs about this unexpected behaviour.

lwitkowski commented 1 week ago
  • you make such thing on purpose in your code => custom problem should be introduced,
  • it is propagation of external system response delivered to RestClient. Then I feel developer should make decision what to do by introducing custom mapper.

Fully agree. I still think our extension should not throw NPEs

Plain HTTP 500 + HTTP Status Code: ${code} will be bit enigmatic in second scenario - personal opinion of me playing role of "consumer of API error" that came from non-standard error code.

Pardon, I wasn't very clear about status code in this specific case: I'd pass status code that is defined in the exception. My take on 500 was more general approach that IMO we should be defensive and not cause NPEs throw by the code in this extension. Simply 500 HttpProblem is better than 500 with ugly NPE stacktrace visible to the client ;)

I'd expect at least some trace ([WARN]?) in logs about this unexpected behaviour.

That's something to consider, although I would not WARN about it more than once to be frank.

pazkooda commented 1 week ago

We agree on NPE - should not be there.

And fact that we re-discuss different options is exact reason why at first we tried to keep it "stupid simple" 😉 .

I'd strongly vote for default behaviour of this extension NOT to propagate custom HTTP codes and go defensive way with HTTP500. It was made for standards in mind and defaults should point to standard imho.

Let me give example.

If this is due to RestClient and someone at source system changes eg. 429 to 492 this extension will propagate it as-is ➡️ you'd not discover this easily within you logs (even with [WARN]). Consumer of our response can also got confused and run to exactly same consideration we're discussing. Eariler the feedback it is usually better.

If we'd start throwing meaningful HTTP500 some monitoring tool can pick this from traces and possibly make alert.

Just a hypothetical scenario.

lwitkowski commented 1 week ago

We agree on NPE - should not be there.

And fact that we re-discuss different options is exact reason why at first we tried to keep it "stupid simple" 😉 .

IMO stupid simple, and in the spirit of the principle of least astonishment would be to set status to what is in the exception (this exception exists for exactly this purpose), without modifying it in pretty arbitrary way (look at my comment below about definition of standard http code)

I'd strongly vote for default behaviour of this extension NOT to propagate custom HTTP codes and go defensive way with HTTP500. It was made for standards in mind and defaults should point to standard imho.

The more I think about it, the less confident I am that there is any distinction between standard and custom http code at all. Is 418 custom or standard? Or 422? The latter is listed on mozilla.org, but is not implemented by JaxRS's Response.Status enum. HTTP specification doesn't restrict apis from using code 567, it only says it should be in a range of 100-599 - and we may consider enforcing this rule and translate anything outside of this range to 500.

Interestingly what I've just found out - Quarkus implementaiton of exception mapper for WebAppException has the same issue and will throw NPE for codes not implemented by Response.Status

IMO the most elastic for programmers, robust, yet without increasing complexity of this extension implementation for this mapper would be:

  1. check if exception.getResponse().getStatusInfo() is not null - that would allow programmers to implement their own StatusType and even provide nice reason phrases for their codes which would be just picked up by this extension.

  2. if getStatusInfo() is null, then we check getStatus() integer and try to get reason phrase from Response.Status (JaxRs's impl) - if it's not available - generic title + status code as is if it's in 100-599 range

  3. else - http 500 internal server error

Let me give example.

If this is due to RestClient and someone at source system changes eg. 429 to 492 this extension will propagate it as-is ➡️ you'd not discover this easily within you logs (even with [WARN]). Consumer of our response can also got confused and run to exactly same consideration we're discussing. Eariler the feedback it is usually better.

I get your point, but who are we to decide to ban 492 codes and translate them to 500s? Again, I don't think there's any standard that would justify this rigidness.

If we'd start throwing meaningful HTTP500 some monitoring tool can pick this from traces and possibly make alert.

Just a hypothetical scenario.

That's veeery hypothetical scenario, don't you think? And even if - why none of the JaxRS implementations (or any other servlet impl) behave like you suggest by 'banning' non standard codes and translating them to internal server error?

lwitkowski commented 1 week ago

We agree on NPE - should not be there.

And fact that we re-discuss different options is exact reason why at first we tried to keep it "stupid simple" 😉 .

For the record: #287

If I read our comments correctly, we didn't want to introduce ExtendedStatus enum which would bring certain complexity and long term cost for us: to maintain it, and extend for every possible code, with reason phrases for which there's no consensus in web community yet (look below).

At least I wasn't in favour of banning codes that are not represented in Response.Status ;)

edit: https://www.iana.org/assignments/http-status-codes/http-status-codes.xhtml - this seems to be an official status code registry - so if anyone wants' to play stick to standards joker card, this should be this one, not Response.Status enum from JaxRs lib ;)

pazkooda commented 1 week ago

I get your point, but who are we to decide to ban 492 codes and translate them to 500s?

We don't ban them. We inform users of this extension that there is some non-standard error code so that they make consious decision what to do with this - by means of writing a mapper.

What I'm stating is just that: I personally would not use custom error code in software I develop - it ususally confuses everyone using it. If my code would propagate non-standard status code due to "pass-through" behaviour of this extension - without explicitly letting me know that such thing is happening - I'd be unhappy. I prefer to control what responses I send to my clients, but also appreciate convinence of propagating RestClient WebAppExc (when error codes are "standard" ;) ). Even if I'll met custom code once in a while I don't mind writing mapper for it (never ever did that in my professional career cause no one used custom codes).

With such a long discussion I forsee we may need to settle with:

quarkus.resteasy.problem.http-codes.allow-non-standard=false/true

What do you think about ☝️ ? (have no attachement to property name, but preference on false being default - Quarkus "bans" non-standard codes with NPE 😜 )

lwitkowski commented 1 week ago

We don't ban them. We inform users of this extension that there is some non-standard error code so that they make consious decision what to do with this - by means of writing a mapper. What I'm stating is just that: I personally would not use custom error code in software I develop - it ususally confuses everyone using it. If my code would propagate non-standard status code due to "pass-through" behaviour of this extension - without explicitly letting me know that such thing is happening - I'd be unhappy.

The fact you would not use them doesn't mean others can't use them. And this extension should not get in the way by throwing NPEs or rewriting perfectly valid but Unassigned (as IANA calls them) codes with 500. This extension is not a sherriff or a guardian, and it's not a place to arbitrary control what responses I send to my clients - you as developer are responsible for controlling and monitoring this.

@pazkooda Can you define (and back it up with some widely accepted RFC) what 'standard' and 'non-standard' code means please? Otherwise this discussion leads to nowhere.

I've spent significant amount of time lately reading about this, I've provided multiple links, and pointed out differences between Response.Status enum and IANA list (which shows that Response.Status is not a good source of truth for anything) and you seemed to ignore it and keep referring to 'non-standard' codes.

I (incorrectly) used this term in the past as well, but I'm now convinced it's a wrong term to call Unassigned codes. Codes outside of 100-599 range could potentially be called like this, but this is outside of this discussion.

pazkooda commented 1 week ago

At first let me quote myself:

... (when error codes are "standard" ;) ).

Note quotation marks and wink eye. I'm not oracle in this topic 😝.

But actually now that you ask I can point to place which aggregates codes that are defined by (at least) some kind of standard with RFC9110 being "base one".

It is this Wikipedia page (note that each "standard" code has respective RFC/etc attached to it).

This is what I mean when use word non-standard ↔️ each Unassigned code that is not part of any standard.

Yet again - I state my preference.

... or rewriting perfectly valid ...

The fact it is valid does not neglect fact it is not defined by any standard and it confuses client code. Someone else will end up with "another NPE" if we bluntly propagate HTTP492 due to source system "mistake" and we haven't anticipated this (informed developer).

Basically source system can start sliently breaking my OpenAPI if I'm not careful enough with handling of RestClient pass-through WebAppExc. I prefer here to be clearly informed that I propagate "vaild but Unassigned status code". I don't want to play a role of guardian if this code is OK, I try to "guard" common-sense approach.

Note on Response.Status - we used it for convinence, not as a source of "standard" - I'd guess it covers more than 99.9% of real world use-cases. And for code simplicy I'd still keep it to be honest.

Edit:

... you as developer are responsible for controlling and monitoring this.

That is why I'd like to have this monitoring explicit. If I find such Unassigned status code that I still want to propagate I'd do a mapper for it (as the one being reponsible).

lwitkowski commented 1 week ago

At first let me quote myself:

... (when error codes are "standard" ;) ).

Note quotation marks and wink eye. I'm not oracle in this topic 😝.

But actually now that you ask I can point to place which aggregates codes that are defined by (at least) some kind of standard with RFC9110 being "base one".

It is this Wikipedia page (note that each "standard" code has respective RFC/etc attached to it).

None of these say using 479 is invalid, forbidden, discouraged etc. None suggest to map it to 500 status.

And IANA (refered by many RFCs) seems to be much better source of truth than wikipedia.

This is what I mean when use word non-standard ↔️ each Unassigned code that is not part of any standard.

Yet again - I state my preference.

... or rewriting perfectly valid ...

The fact it is valid does not neglect fact it is not defined by any standard and it confuses client code.

Unassigned are defined as 'valid' by HTTP standard itself.

Someone else will end up with "another NPE" if we bluntly propagate HTTP492 due to source system "mistake" and we haven't anticipated this (informed developer).

I don't think we can reasonably prevent NPEs outside of this extension, IMO we shouldn't even try, especially with hiding real cause from api clients. Which bug report is easier to debug:

What's more explicit and can point to e.g upstream system immediately without looking at logs or proxy code? I really see zero reason, zero added value for this mapping, yet I see some downsides.

Basically source system can start sliently breaking my OpenAPI if I'm not careful enough with handling of RestClient pass-through WebAppExc. I prefer here to be clearly informed that I propagate "vaild but Unassigned status code". I don't want to play a role of guardian if this code is OK, I try to "guard" common-sense approach.

In such setup, you - as developer - should have try-catch and catch WebAppExceptions around you RestClient and act accordingly. And setup monitoring for this part of application.

Note on Response.Status - we used it for convinence, not as a source of "standard" - I'd guess it covers more than 99.9% of real world use-cases. And for code simplicy I'd still keep it to be honest.

I'm also leaning towards still using it, but I'm strongly against mapping everything that is not on this arbitrary list to http 500.

... you as developer are responsible for controlling and monitoring this.

That is why I'd like to have this monitoring explicit. If I find such Unassigned status code that I still want to propagate I'd do a mapper for it (as the one being reponsible).

I think there's some confusion here - there's no such thing as a mapper for http code. There's a mapper for exception type. If you want to ask developers to create mapper for WebAppException just to allow valid codes to pass-through, you're forcing them to reinvent the wheel and include current logic from this extension into their version of the mapper along with some additional custom logic. I just don't see the point.

lwitkowski commented 1 week ago

@JacobOJ I've made 2 suggestions to simplify the change a bit, would you be ok with these?

@pazkooda please have a look at the changes after my suggestions are applied - it will get rid of NPE with one very small and limited change in the codebase. Asserted value of "Unknown code" comes from JaxRs impl (but not Response.Status enum), my change utilises this part of impl.

I suggest we park or extract the discussion about mapping unassigned codes to 500 internal server errors outside of this fix.

JacobOJ commented 1 week ago

@lwitkowski Yes, nice changes. I have updated the PR with your suggestions as well as updated the name of the test to say unassigned instead of invalid http code

lwitkowski commented 1 week ago

LGTM, thanks again for contribution @JacobOJ , I'll probably release new minor version later today.

JacobOJ commented 1 week ago

LFTM, thanks again for contribution @JacobOJ , I'll probably release new minor version later today.

Thank you, too. Looking forward to the new release.

lwitkowski commented 1 week ago

Thank you, too. Looking forward to the new release.

v3.15.0 has just landed in maven central: https://repo1.maven.org/maven2/io/quarkiverse/resteasy-problem/quarkus-resteasy-problem/3.15.0/