Closed cmgrote closed 4 years ago
On the call we questioned whether these response bodies could also apply to DELETE
s, it looks like they can (https://tools.ietf.org/html/rfc7231#section-4.3.5):
"If a DELETE method is successfully applied, the origin server SHOULD send a 202 (Accepted) status code if the action will likely succeed but has not yet been enacted, a 204 (No Content) status code if the action has been enacted and no further information is to be supplied, or a 200 (OK) status code if the action has been enacted and the response message includes a representation describing the status."
(Implying that there are response messages available in addition to the status code, even on deletes.)
(I've linked to the latest HTTP/1.1 spec above, but even going back to the very first HTTP/1.0 spec from 1992 there were response bodies possible against non-200 status codes: https://www.w3.org/Protocols/HTTP/HTRESP.html)
Happy to look into rolling this out; on the call someone mentioned that in the past we were sending non-200 responses -- any quick pointers on where / how to we were doing that? Just looking for where to start with the response status codes (ie. where they're set / returned by the rest API itself), and in particular in a non-Spring-specific manner to remain portable (?)
It appears that the way to handle these non-200 responses in Spring is via exceptions themselves... However, we of course don't want to make our exceptions (or exception-handling) Spring-specific, so I'm thinking of the following approach:
common-services
module specifically for Spring (common-services/spring-services
)@ControllerAdvice
-- which is how I can control the HTTP status returned for any errors)This just means the following changes to existing modules where spring resources are defined:
... and our existing exception handling and recording (outside of Spring) should remain entirely untouched.
To my mind there are 3 options:
So I think it comes down to a choice between 1 and 2 - meaning that in my opinion it would be OK to change and break compatibility if we want to; and the decision is about how we want the API to look.
In defence of option 1:
In defence of option 2:
The one thing that is critical with EITHER approach is that the content of the response body needs to accurately and adequately convey the outcome of the operation including the suggested user-action in a human-intelligible form.
My current preference would be to initially talk to people in industry sectors where there may be old gateways, to assess the risk of option 2, then if the risk is deemed low, adopt option 2.
Some useful references:
On use of status codes relative to success/failure of the operation: https://blog.restcase.com/rest-api-error-codes-101/ (Note that Facebook use 200 whatever the outcome; but the author's recommendation is counter to what Facebook do.)
Useful thread on use of HTTP status codes with regard to success/failure: https://stackoverflow.com/questions/942951/rest-api-error-return-good-practices (See in particular the answer from "Larry K" (located just below all the flowchart pictures)
On standardisation of error responses: https://dzone.com/articles/rest-api-error-handling-problem-details-response Quite interesting: RFC7807
On documentation of REST APIs: https://www.openapis.org Linux Foundation OAS
Hmmm... I thought we'd discussed this on the call and agreed that we should move ahead with option (2) unless there was a standard that dictated otherwise, but maybe I over-interpreted... As best I can tell the references linked are all opinion-based rather than a standard?
The only exception being the last one for OAS, which to me clearly advocates option 2 (http://spec.openapis.org/oas/v3.0.2#http-status-codes -- itself referring to the IETF / IANA standard):
"The HTTP Status Codes are used to indicate the status of the executed operation. The available status codes are defined by [RFC7231] and registered status codes are listed in the IANA Status Code Registry."
As mentioned earlier, response bodies have been allowed on non-200 responses since the early '90s, so we're talking ~30 years now... So I'm certainly keen that we address your point on industry sector input where there may be old gateways, but we'd presumably be talking about gateways that are > 30 years old (?)
Actually I'm not sure I understand this point about old gateways in general -- my understanding is that for them to not support response bodies they'd need to not support (predate?) the ~30-year old HTTP/1.0 spec, so I'm not sure I follow how they'd interpret / handle HTTP at all (status codes or response bodies) (?)
All options are valid by the spec, so we are just discussing style.
I was leaning towards option 2 but given the discussions above, I have changed my mind and think option 1 is better for us.
Option 1 retains backward compatibility for our APIs, ensures our errors will always get through and allows us to distinguish between requests that fail in the transport layer rather than in our server. This could be really useful in complex multi-cloud environments.
The more minor point is that it is the least amount of work and we are desparately overloaded.
I was just reading through the earlier comments and completely agree we need to document our REST API style - not just the use of the status codes - but also the way we format complex request/response bodies, the handling of exceptions, structure of the URL.
Our initial decision was that the REST APIs were private to Egeria and that technology is plugged in through the connectors - or by calling the client interfaces.
However, having the REST APIs as external APIs has advantages in supporting multi-programming environments - we see this advantage in the Python Notebooks - so it is reasonable to revisit this decision.
I can appreciate that option 1 has some advantages, but I also believe some potentially significant disadvantages.
In support of option 2:
requests
library (3rd most-downloaded library in PyPI with 13M downloads / week) provides a raise_for_status()
method that automatically raises errors for unsuccessful responses based on the response code (https://2.python-requests.org/en/master/user/quickstart/#response-status-codes). Furthermore, it automatically interprets the success of a response as a boolean based on the response code (between 200 and 400 as success, and anything else as a failure (https://realpython.com/python-requests/#status-codes)). That is, they can programmatically check this without needing to first understand our response body format (find our documentation, read it, understand it, etc) to parse it for a potential error. (Another example being curl
(https://curl.haxx.se/docs/manpage.html#-f), which is perhaps even more significant given potential challenges parsing JSON from command-line scripts to detect whether there is an error inside the body of the response or not vs. simply checking the status code for success or failure. While additional libraries can be installed to help with this (eg. jq
or even using python
or other higher-level languages themselves, this increases the footprint necessary for simple automation via areas like containers.)javax.ws.rs
, extends its built-in exceptions (that do generate such codes) and throws them from the API resources under certain conditions)I can appreciate that some codes (like 404) may seem particularly confusing in this server-internal vs transport-layer error debate between option 1 and option 2. However, I also see that even for a 404 response, given that our REST resource URLs for things like getEntity actually have the GUID in the URL itself, that it's debatable whether this is a valid URL or not in the same way that mis-spelling some other portion of the URL would be invalid (and validly result in a 404) -- from the perspective of a URL that does not exist, both seem equally relevant to me.
However, IMHO it would be better to take such debatable codes as an exception we explicitly handle rather than as a reason to entirely discard option 2: eg. perhaps under option 2 we instead opt not to use 404's at all, so for endpoints like getEntity we simply return a 200 status with a null or empty response object (rather than any error body). (For me it's debatable whether it's even an error that someone has requested an entity instance that we have not stored in our repository, for example, or whether we should simply report that the request succeeded but we have no such entity in our repository.)
Equally, I feel that other codes are quite clear-cut: a 500 Internal Server Error
to me is a clear indication that something unexpected has gone wrong in our processing, within the server (and not at the transport level). And per my book above (😉), would be a very clear communication that does not require me to understand the response body format to understand that the server did not succeed in processing my request.
(Edited to also point out how option 2 can assist with command-line automation via common tools like curl
which are often used in environments that won't have complex JSON parsing capabilities.)
My thoughts
One option (though it could be deemed procrastination?) is to post a proposal to make this change very clearly on our public list/chat & gather feedback over a release cycle. Perhaps linking in this with clarification over the status ( -> public? ) of the rest api
Good discussion and I agree with pretty much everything that has been said, including the point that this is about more than just implementation of a rigid specification - it depends on how one applies the specifications in terms of mapping to architectural concepts (e.g. layers) and both specs/standards and precedent/opinion are important.
On both fronts I am still of the opinion that our APIs would be better if we were to implement option 2.
Reading Nigel's comment above I realise I should clarify my earlier definition of "option 2" in which I said "not worry about breaking compatibility". What I meant is that we should not attempt to preserve backward compatibility - e.g. by API versioning or introducing a dual API - those would be the activities of option 3 (which I am not in favour of). If we were to adopt option 2 I think we need to be very public about the proposed change and provide [at least] a release of warning. My thoughts were that we might propose the change with 1.4 and implement it in 1.5 (subject to approval by the wider community) - together with comprehensive documentation.
Yes, I think we’d have to have this release-long request for review on option 2 as per the 1.1 release notes I would consider (at least some of) the APIs to be “released functionality”: https://github.com/odpi/egeria/blob/master/release-notes/release-notes-1-1.md
(Though without thorough documentation on error formats, status codes, etc there is potentially still room for interpretation on its level of “released”...)
But as far as I'm aware we can't really do anything with Egeria without first configuring a cohort, repository, etc? Short of someone magically knowing how to write up a config document by hand (doubtful?) all of that requires the use of the administration API -- and I don't believe there's any client yet for the administration (only the API). So I think there's a pretty strong argument that at least some of the APIs are public, released, and have been fully adopted by everyone that's using Egeria 😉
Based on today's discussion we agreed that given where we are with other implementations building on the backbone of OMRS, that we would retain the existing style (option 1) for OMRS.
For OMAS endpoints, we could/should consider the use of option 2.
Closing as I believe this is now decided for OMRS, and permitted for OMASes.
Per discussion on #2353:
Is this a limitation on our side, or a difference in a standard that defines how REST API status codes differ from the HTTP protocol itself? I can't see anything in the HTTP standard restricting message bodies (response objects?) to status code 200. The overview of status codes (https://tools.ietf.org/html/rfc7231#section-6) explains:
(Per the last sentence, a non-200 code of 471 has a response message that provides further explanation.) Regarding a 500 response, specitically (https://tools.ietf.org/html/rfc7231#section-6.6):
Again, is this something specific to REST APIs vs the HTTP protocol itself? From the standard (https://tools.ietf.org/html/rfc7231#section-6.3), it feels pretty strongly worded to me that 200 is defined as success:
It's confusing to me that we would specify different HTTP status codes (like 500, 404, etc) on the definition of errors, but not actually use them as the response status code -- which per the standard references above seems like it should be possible? -- and I suspect will be confusing to users as well (who upon seeing a 200 response status may well simply ignore any body of the response itself)?
Suggestion: