Should 'type' definitely be required to be a URI?

ietf-wg-httpapi / rfc7807bis

Revision of RFC7807: HTTP Problem Details

Other

20 stars 8 forks source link

Should 'type' definitely be required to be a URI? #11

Closed pimterry closed 3 years ago

pimterry commented 3 years ago

In @dret's Twitter feedback thread, this was the only and much-repeated comment. I'll copy some of the posts from there in here:

https://twitter.com/vasilakisfil/status/1355083598777511937:

the type should not required to be a URI reference, this is a killer for most ordinary developers ... Having worked closed with such developers I understand why. Not everyone is so API savvy. It isn't very intuitive for devs not working on hateoas and APIs, so they drop the whole spec completely making things worse. Making it a requirement, has the opposite effects.

https://twitter.com/pimterry/status/1355112881646399491 (me):

+1, I've seen confusion here, it's counter-intuitive. Seems odd to use the id as the implicit link to the docs too: what if I want two errors to be distinguishable but both reference the same docs? An arbitrary string with a suggestion that it be a URI would be better imo. Oh, and what if I want to change the URLs for my error docs? If 'type' links to them, as encouraged, I either have to change the type identifiers for all my errors or have point them at the old URLs forever.

https://twitter.com/Riussi/status/1355091136868737024:

I concur. We've just started using RFC 7807 and defining those would be easier if it is not mandated as a URI

https://twitter.com/simonplend/status/1355087138497282048:

I've also found that folks find this really confusing. "What is this URL for?"

This confusion is also repeated frequently elsewhere, e.g. it's the top complaint in unrelated reddit discussion of the standard. Meanwhile https://jonathancrozier.com/blog/base-your-api-error-response-model-on-a-solid-standard-with-the-problem-details-rfc (on the first page of results for "problem details standard") says:

For example, let’s consider the type property. For most of the projects I am working on, it isn’t practical to have a webpage dedicated to each type of possible error. Given that the standard specifically states that the value is assumed to be "about:blank" if it is not present, I usually leave this member out.

Arguably type is the most useful field in the standard, so this is not a good result.

I think it'd be useful to discuss this, at least to document the reasoning here more clearly, and to consider alternative approaches and/or alternative ways to make type URIs easier for devs.

pimterry commented 3 years ago

I'll add a separate comment, from my personal point of view:

As far as I understand, the goal is to encourage namespacing of error types, is that right? Are there other benefits that the URI provides?
Imo, it's quite unusual as a web API response - it reminds me of a more rigorous XML-style API design with careful namespacing of all identifiers, rather than the less rigorous JSON formats that are now more common. It feels inconsistent with other HTTP standards too, e.g. link relation types do not use URIs as type identifiers.
Similarly, while API developers will be familiar with HTTP URLs, a great many (I believe) are not familiar with the URIs more generally and the other formats available (e.g. Erik mentioned tag URIs - I'd like to think I'm reasonably interested in the standards and tools available, and I'd never even heard of these!).
Recommending that unique id for an error type to be a documentation URL is problematic, because documentation URLs may well change and unique ids must not (yes, old documentation URLs should redirect to new ones, but there's not always a 1-1 mapping of URLs, it's messy to keep using the incorrect URL/domains in this field, and we all know this isn't always true/possible anyway).
Having a separate optional documentation URL instead would be more useful imo. Right now, it's not guaranteed that a type URL is dereferenceable, so applications must effectively treat it as never dereferenceable. A separate field (or indeed link relation) would be defined as such, making it usable in many more cases.
Making it an opaque string instead would still allow for the use of URIs for namespacing if desired, and we could still recommend that users do that without necessarily requiring that.

asbjornu commented 3 years ago

Are there other benefits that the URI provides?

URIs can be dereferenceable. That's a pretty significant feature.

Recommending that unique id for an error type to be a documentation URL is problematic, because documentation URLs may well change and unique ids must not

Perhaps the RFC can mention the possibility to use HTTP's features to make URIs stable, such as using stable API URLs in type that perform HTTP redirects towards the (ever changing) documentation.

Having a separate optional documentation URL instead would be more useful imo.

I think that's a horrible suggestion. Big 👎🏼 .

Making it an opaque string instead would still allow for the use of URIs for namespacing if desired

Opaque strings are already allowed. I don't really understand the problem here. The last paragraph of RFC 7807 section 3.1 states the following:

Note that both "type" and "instance" accept relative URIs; this means that they must be resolved relative to the document's base URI, as per [RFC3986], Section 5.

This means "type": "i-hate-uris" is perfectly valid.

sazzer commented 3 years ago

The type field currently has to be a URI Reference. This means it can either be a URI or a relative-ref. This in turn means that you can use almost any bare string in there and still be following the letter of the RFC, even if not the intention.

I've seen examples of exactly that - with values for type that are actually just error codes because, strictly speaking, they are legal relative URIs that just don't happen to resolve to anything.

Personally, I've no major problem with that, except that it makes it much more likely to get collisions across APIs - with two different APIs using the same type "URI" to mean different things.

I can imagine there are some big benefits to be obtained if the URI is dereferenceable though. Things like getting human-readable documentation from it, or a JSON Schema that describes the shape of the problem details. Dereferencing an instance URI might take you directly to the log messages for what went wrong (for internal APIs only!)

But equally, saying that these must be dereferenceable is potentially quite a big hurdle for many smaller developers that would potentially either stop them using this or else just have them ignore the difficult bits.

pimterry commented 3 years ago

URIs can be dereferenceable. That's a pretty significant feature.

Currently, type is not dereferenceable though, explicitly. The RFC says "Consumers SHOULD NOT automatically dereference the type URI", and has no mechanism to know when it could be dereferenceable. In effect, it's manually dereferenceable only.

Imo it would be more useful there was such a mechanism, or indeed a separate always-dereferenceable URL elsewhere.

I think that's a horrible suggestion. Big 👎🏼 .

Could you be more specific about why?

This means "type": "i-hate-uris" is perfectly valid.

I see - that's not communicated at all right now! If that's intended, I think it would be useful to communicate that better in the spec. Currently every single example in the spec is an absolute HTTP URL, all discussion of the standard elsewhere does the same, and I think the comments above clearly show that other options aren't widely recognized.

Illustrating that a classic error code string is allowed but a full URL is encouraged would resolve quite a few of the concerns there I think.

pimterry commented 3 years ago

Race condition posting there, sorry - to reply to @sazzer:

I've seen examples of exactly that - with values for type that are actually just error codes because, strictly speaking, they are legal relative URIs that just don't happen to resolve to anything.

Ok, I think this agrees with @asbjornu's point, and this is likely a communication issue rather than a change to the format itself them.

I can imagine there are some big benefits to be obtained if the URI is dereferenceable though.

saying that these must be dereferenceable is potentially quite a big hurdle for many smaller developers

Agree, I'd love to have a documented way to know when that's possible. Making it possible to include a dereferenceable URL explicitly but optionally would be a big help.

dret commented 3 years ago

This means "type": "i-hate-uris" is perfectly valid.
I see - that's not communicated at all right now! If that's intended, I think it would be useful to communicate that better in the spec. Currently every single example in the spec is an absolute HTTP URL, all discussion of the standard elsewhere does the same, and I think the comments above clearly show that other options aren't widely recognized.

that wouldn't be great as a pattern to recommend. the actual error URI in this case is the relative URI resolved against the URI of the context (in this case, the resource that returned the error). so while this is technically legal, it's pretty bad in terms of utility when it comes to implementations properly handling the type values as URIs.

sazzer commented 3 years ago

Race condition posting there, sorry - to reply to @sazzer:

Yeah, we both posted the same thing at the same time :) And yes, I think we are both saying the exact same thing.

Edit - looking at timestamps it wasn't the exact same time. Just my window hadn't updated at the time!

Currently, type is not dereferenceable though, explicitly. The RFC says "Consumers SHOULD NOT automatically dereference the type URI", and has no mechanism to know when it could be dereferenceable. In effect, it's manually dereferenceable only.

Maybe I'm wrong, but I've always read that with the emphasis on "automatically". As in - dereference only when needed and not always. And I've read it like that because some scenarios that would cause errors to be returned would also mean that the URI that is the problem type would also fail to resolve, and potentially cause more problems for the service (DoS attack or similar.) Equally, especially for type, it could be that the intention is for humans to dereference it and not computers. For example, I've started using type values similar to https://httpstatuses.com/404 for cases where the status code alone tells you everything, but I still want to include a Problem response.

There's also a problem that not all clients are capable of dereferencing all URIs. For example, if it's an "http" scheme then you need to be able to make HTTP requests to the target server. Not all clients can do that, especially if they are on a closed network.

simonplend commented 3 years ago

This means "type": "i-hate-uris" is perfectly valid.

I see - that's not communicated at all right now! If that's intended, I think it would be useful to communicate that better in the spec. Currently every single example in the spec is an absolute HTTP URL, all discussion of the standard elsewhere does the same, and I think the comments above clearly show that other options aren't widely recognized.

that wouldn't be great as a pattern to recommend. the actual error URI in this case is the relative URI resolved against the URI of the context (in this case, the resource that returned the error). so while this is technically legal, it's pretty bad in terms of utility when it comes to implementations properly handling the type values as URIs.

@dret Would it be reasonable then to also include examples of type URIs which are not URLs? e.g. tag URIs as you suggested on Twitter.

dret commented 3 years ago

On 2021-02-02 14:07, Simon Plenderleith wrote:

@dret https://github.com/dret Would it be reasonable then to also include examples of |type| URIs which are not URLs? e.g. tag URIs https://tools.ietf.org/html/rfc4151 as you suggested on Twitter.

i think what we have clearly established is that if 'type' remains to be defined as a URI, better guidance would be a good idea.

asbjornu commented 3 years ago

Currently, type is not dereferenceable though, explicitly. The RFC says "Consumers SHOULD NOT automatically dereference the type URI", and has no mechanism to know when it could be dereferenceable. In effect, it's manually dereferenceable only.

SHOULD NOT automatically just means there should not be automatic dereferencing. Non-automatic dereferencing such as making the URI a user-clickable link, or a developer copying and pasting it into a browser, is perfectly valid, though. That should perhaps be made more explicit.

I think that's a horrible suggestion. Big 👎🏼 .

Could you be more specific about why?

type works perfectly well for documentation. Decoupling the problem type from the problem type's documentation when the former can redirect to the latter just adds confusion, possible inconsistencies, and errors, imho.

This means "type": "i-hate-uris" is perfectly valid.

I see - that's not communicated at all right now!

But it is. As I wrote above, the last paragraph of RFC 7807 section 3.1 states the following:

Note that both "type" and "instance" accept relative URIs; this means that they must be resolved relative to the document's base URI, as per [RFC3986], Section 5.

…

If that's intended, I think it would be useful to communicate that better in the spec.

I agree that can be emphasized closer to the definition of type and not just as a seemingly unrelated paragraph. A way to accomplish this would be to promote each member in section 3.1 to its own section so the spec can use more than one paragraph to describe each member.

Currently every single example in the spec is an absolute HTTP URL.

Examples are non-normative. To understand the spec, you have to read the normative text. An example containing type with a relative URI would be a good addition, though.

all discussion of the standard elsewhere does the same, and I think the comments above clearly show that other options aren't widely recognized.

Agreed.

Would it be reasonable then to also include examples of type URIs which are not URLs? e.g. tag URIs as you suggested on Twitter.

I think tag URIs sounds like a bad fit for type as they are more of a human readable alternative to UUID.

pimterry commented 3 years ago

the actual error URI in this case is the relative URI resolved against the URI of the context (in this case, the resource that returned the error).

Hmm, yes, so "type": "my-custom-error" is actually quite very bad. Doing so is most likely a mistake: if a developer returns "type": "my-custom-error" as an error type from two different resources, clients must treat those as distinct types, relative to different base URLs. That's quite surprising! This is probably not what the dev intended, and code that does treat them as equivalent automatically is not following the spec.

It looks like this is a real problem: even the most popular problem details implementation makes this mistake. It matches problem detail classes automatically by comparing the type value, and never uses a base URL or relative resolution for type values anywhere. I suspect that it's not alone in this, and that using bare strings as global error types will work in many places, incorrectly.

Would it be reasonable then to also include examples of type URIs which are not URLs? e.g. tag URIs

Imo, tag URIs are a good improvement, but they're still not great. The most convenient equivalent tag we'd be suggesting is probably tag:example.com,2021:my-error. It's obvious that it isn't dereferenceable, which is good, but it's still an unusual error code format.

There's clearly friction with URIs error codes right now, and widespread patterns of not using URIs for error identifiers in every existing API I can find (e.g. Stripe, AWS, Azure). Instead, they all use opaque string identifiers decoupled from documentation URLs.

From my perspective and I think from the feedback above, it would be useful to have a standard that supported existing common error API patterns. This doesn't preclude namespacing where that's useful, it just doesn't enforce it.

I've always read that with the emphasis on "automatically". As in - dereference only when needed and not always.

SHOULD NOT automatically just means there should not be automatic dereferencing.

Ok, I see how this sentence is referring to fully automated dereferencing concerns, thanks! That's useful.

The spec still supports what I'm saying though, right? Even if the type is an HTTP URL, there is no requirement or guarantee that it leads to a real resource. The spec says that it's encouraged to do so, and no more, and @sazzer's comment that "saying that these must be dereferenceable is potentially quite a big hurdle" seems to agree. Is that not correct?

Assuming that's right then tooling can't use the URI as a helpful documentation link for devs, or anything similar. That's not valid because type is not specified to be a URL of a resource at all. It's not designed for human or machine consumption, it's only guaranteed to be usable as a type id.

I think being able to use these URIs would be great. There's good use cases, from generic HTTP clients that helpfully link to error documentation in exceptions to my own HTTP debugger, where I would love to pull in error docs alongside error responses.

To build these tools we need a URL that's guaranteed to go to docs, if present. If the documentation URL were instead in a link header with an HTML media type, similar to other standards (e.g. the documentation link for the deprecation header) then that would be possible. Using a link header would also better fit with HATEOAS, and would make error docs immediately visible to existing tooling that isn't aware of problem details.

Decoupling the problem type from the problem type's documentation when the former can redirect to the latter just adds confusion, possible inconsistencies, and errors, imho.

Imho, as a developer I would find this spec more useful if they were indeed decoupled, and the type wasn't necessarily a URI at all (meaning no duplication & possible inconsistency).

Doing so would resolve all the feedback above, would make this standard would be easier to get started with, it'd support more cases like error types that share the same documentation URLs and support changing doc URLs, it'd avoid semantics like relative URIs that are not being implemented correctly, it would support more dereferencing use cases, and it would better match existing patterns in other specifications and in existing real-world APIs.

I do see how this is subjective and debatable, but there's many benefits, and imho requiring type URIs like limits some valuable use cases.

Going to leave this here for now, since I need to get some other work done! This is really interesting though, hope these points are helpful.

asbjornu commented 3 years ago

@pimterry:

It looks like this is a real problem: even the most popular problem details implementation makes this mistake.

The examples look good to me:

Problem.builder()
    .withType(URI.create("https://example.org/out-of-stock"))
    .withTitle("Out of Stock")
    .withStatus(BAD_REQUEST)
    .withDetail("Item B00027Y5QG is no longer available")
    .build();

Will produce this:

{
  "type": "https://example.org/out-of-stock",
  "title": "Out of Stock",
  "status": 400,
  "detail": "Item B00027Y5QG is no longer available"
}

…

There's clearly friction with URIs error codes right now, and widespread patterns of not using URIs for error identifiers in every existing API I can find (e.g. Stripe, AWS, Azure). Instead, they all use opaque string identifiers decoupled from documentation URLs.

But afaict, none of those APIs implement RFC 7807 and thus aren't very relevant in this discussion.

From my perspective and I think from the feedback above, it would be useful to have a standard that supported existing common error API patterns.

What's considered "common" is relative and up to the eye of the beholder. Using Universal Resource Identifiers as identifiers shouldn't be the esoteric, impenetrable mystery it's being described as, imho.

The spec still supports what I'm saying though, right? Even if the type is an HTTP URL, there is no requirement or guarantee that it leads to a real resource.

Correct.

Assuming that's right then tooling can't use the URI as a helpful documentation link for devs, or anything similar.

Why not?

That's not valid because type is not specified to be a URL of a resource at all. It's not designed for human or machine consumption, it's only guaranteed to be usable as a type id.

I disagree. The specification states this design goal very clearly (emphasis mine):

"type" (string) - A URI reference [RFC3986] that identifies the problem type. This specification encourages that, when dereferenced, it provide human-readable documentation for the problem type (e.g., using HTML [W3C.REC-html5-20141028]).

…

I think being able to use these URIs would be great. There's good use cases, from generic HTTP clients that helpfully link to error documentation in exceptions to my own HTTP debugger, where I would love to pull in error docs alongside error responses.

What do you mean with "pull in"?

To build these tools we need a URL that's guaranteed to go to docs, if present.

Why not change it to "guaranteed to go to docs, if dereferenceable"?

If the documentation URL were instead in a link header with an HTML media type, similar to other standards (e.g. the documentation link for the deprecation header) then that would be possible. Using a link header would also better fit with HATEOAS, and would make error docs immediately visible to existing tooling that isn't aware of problem details.

There's no one stopping anyone from doing exactly that.

Imho, as a developer I would find this spec more useful if they were indeed decoupled, and the type wasn't necessarily a URI at all (meaning no duplication & possible inconsistency).

You avoid the duplication of resolved type URIs if you use absolute URIs as your problem type.

Doing so would resolve all the feedback above, would make this standard would be easier to get started with, it'd support more cases like error types that share the same documentation URLs and support changing doc URLs, it'd avoid semantics like relative URIs that are not being implemented correctly, it would support more dereferencing use cases, and it would better match existing patterns in other specifications and in existing real-world APIs.

It would also be a breaking change with existing software that actually implements the specification according to how it is written.

sazzer commented 3 years ago

It looks like this is a real problem: even the most popular problem details implementation makes this mistake.

The examples look good to me:
Problem.builder()
    .withType(URI.create("https://example.org/out-of-stock"))
    .withTitle("Out of Stock")
    .withStatus(BAD_REQUEST)
    .withDetail("Item B00027Y5QG is no longer available")
    .build();
Will produce this:
{
  "type": "https://example.org/out-of-stock",
  "title": "Out of Stock",
  "status": 400,
  "detail": "Item B00027Y5QG is no longer available"
}

I think the problem is more with relative URIs. If I've been following the discussion correctly then this:

GET /some/bad/uri HTTP/1.1
Host: www.example.com

-----
HTTP/1.1 400 Bad Request
Content-Type: application/problem+json

{
    "type": "some-bad-request"
}

Should be parsed as if the type field was actually http://www.example.com/some/bad/some-bad-request, because it's relative to the URL that actually requested the resource in the first place. And (almost) nobody actually does that - they all treat it as if the type field was just some-bad-request instead.

Using Universal Resource Identifiers as identifiers shouldn't be the esoteric, impenetrable mystery it's being described as, imho.

They're not, really. XML has been doing it for 20+ years with namespace URIs. There are probably examples that go back further too.

Why not change it to "guaranteed to go to docs, if dereferenceable"?

As soon as you say "guaranteed" then you're putting a burden on developers. Especially when they plan to support dereferencing them at some future time, but not just yet. They will either ignore it, and thus not follow the spec correctly, or else it'll force them to use a different pattern for their URIs.

pimterry commented 3 years ago

I think the problem is more with relative URIs.

Exactly :+1:. Nowhere do any libraries I've seen even look at the resource URL, so they can never handle relative types correctly.

That's not valid because type is not specified to be a URL of a resource at all. It's not designed for human or machine consumption, it's only guaranteed to be usable as a type id.

I disagree. The specification states this design goal very clearly

The specification there states the goal of linking to a useful document and encourages that, but explicitly does not guarantee or mandate it. I think we agree it shouldn't, and that not all URIs will be links to usable documentation. AFAICT, it's totally valid to intentionally use an HTTP type URI that leads to a 404.

Meanwhile, no developer wants to be given a link to documentation and then hit a 404. If tools want to present type URIs to developers as helpful links to human-readable documentation, they first need to know whether it is actually a link to a resource.

Why not change it to "guaranteed to go to docs, if dereferenceable"?

The problem with "guaranteed to go to docs, if dereferenceable" as opposed to "if present" is that the former requires automatically dereferencing every type to know if it's dereferenceable. The spec specifically discourages that, for the good reasons you've pointed out above, so we shouldn't design for that. An explicit link doesn't have this problem at all.

Using Universal Resource Identifiers as identifiers shouldn't be the esoteric, impenetrable mystery it's being described as, imho.

Sorry, I don't mean to portray URIs as completely impenetrable, that's not fair. It seems true that the huge majority of existing APIs have existing error formats that don't use URIs for error codes though, and that many developers aren't very familiar with the nuances of URIs, e.g. URNs, tag URIs, etc.

URIs are also clearly more complicated than plain strings, and all else being equal it would be good to make implementing this standard (and all standards!) as simple as possible.

none of those APIs implement RFC 7807 and thus aren't very relevant in this discussion.

I really think existing widely used APIs are relevant to this standard (and all API standards!).

Those APIs don't use this RFC today, but the standard only becomes useful if it's widely used by real clients and APIs. Popular APIs like Stripe/AWS/et al eventually supporting this is part of that, and adoption is easier for them if it fits their existing patterns (for example, they wouldn't need to change their existing error code documentation, or change logic that checks those codes).

Adoption is similarly easier for clients and APIs responses are easier to immediately understand if the format fits their expectations, which are largely based on the common patterns they see in existing APIs.

If we want the standard to be widely adopted, we have to care about existing API patterns (and I really want this standard to be widely adopted!)

Ok, so as discussed, using URIs for type does have some downsides, but we do have two concrete benefits that we've mentioned so far:

It strongly encourages namespacing.
It potentially links the type to a resource.

Are there others I've missed?

I think my position is that the first benefit is good, but IMHO outweighed by the downsides (and would still be an option for those who want it without requiring URIs) whilst the 2nd benefit isn't fully realized in the current draft (you can't automatically know whether it's a usable link or just an id) and would be better realized with link headers (more explicit, more standard, less coupled).

It would also be a breaking change with existing software that actually implements the specification according to how it is written.

Yep, even if we agree that this would be beneficial, there is definitely a question of compatibility with existing implementations. I think there's some good routes through here with minimal impact, and this is still a draft anyway. I don't think it makes much sense to do detailed analysis & planning for that though until there's some consensus on the kind of changes we'd like to make.

asbjornu commented 3 years ago

@pimterry:

Meanwhile, no developer wants to be given a link to documentation and then hit a 404. If tools want to present type URIs to developers as helpful links to human-readable documentation, they first need to know whether it is actually a link to a resource.

There's absolutely no guarantee that can be given by anyone, anywhere, at any time, that any URI will resolve to anything at all. All URIs can lead to 404, no matter what their context and semantics are and no matter how many different avenues to provide links you give. Given enough time, most URIs end up pointing to a domain squatted by a shark. Is this of concern to RFC 7807bis? I believe it isn't.

The problem with "guaranteed to go to docs, if dereferenceable" as opposed to "if present" is that the former requires automatically dereferencing every type to know if it's dereferenceable.

Scratch "guarantee" altogether. No guarantees can be given. Let's work from that premise and see where we go.

Sorry, I don't mean to portray URIs as completely impenetrable, that's not fair. It seems true that the huge majority of existing APIs have existing error formats that don't use URIs for error codes though, and that many developers aren't very familiar with the nuances of URIs, e.g. URNs, tag URIs, etc.

The perception of impenetrable mystique built around URIs affects both how RFC 7807's type is perceived as well as how hypermedia is approached (or rather, isn't) in the realm of APIs. Unlike you, though, I think that's ample reason to not give in to the perception, but instead, evangelize for more use of URIs both as identifiers as well as the target of hypermedia controls in an API.

The fact that most HTTP APIs aren't RESTful by any stretch of the imagination doesn't make it right for a standardization organization such as IETF to cave in and succumb to the million flies doctrine. I'd say the right thing to do is the direct opposite and continue to normalize the use of URIs as well as hypermedia so APIs can become more, not less, RESTful over time.

URIs are also clearly more complicated than plain strings, and all else being equal it would be good to make implementing this standard (and all standards!) as simple as possible.

I think that's just a matter of familiarity. Use URIs enough and they become the opaque identifiers they are meant to be. A simple solution to this may perhaps be to override the base URI. Since Content-Base isn't an option and afaik no base link relation exists, we need another way to establish a base URI, perhaps by registering the base link relation.

Another alternative is to add a slash to make the URI absolute: "type": "/out-of-credit". Now you have a type that will be considered equal no matter how you compare it against a type given in a response from the same authority.

I really think existing widely used APIs are relevant to this standard (and all API standards!).

As a reminder of how bad sad the status quo is, perhaps.

Those APIs don't use this RFC today, but the standard only becomes useful if it's widely used by real clients and APIs.

True.

Ok, so as discussed, using URIs for type does have some downsides, but we do have two concrete benefits that we've mentioned so far:

It strongly encourages namespacing.

It potentially links the type to a resource.

Are there others I've missed?

As discussed in #7, there's an existing desire to reuse type URIs across different APIs, something which is impossible with opaque strings since you have no idea whether invalid-draft coming from three different APIs means:

It is currently impossible to let a current of air into the enclosed space specified in the request.
The document specified does not exist in draft form and may be published or deleted.
Military enrolment is currently not possible.

Semantical ambiguities like these aren't insignificant and URIs solve them elegantly and efficiently. schema.org is a testament to how successful the use of URIs to identify things in an unambiguous, unique, and reusable way, really is.

I think there's some good routes through here with minimal impact

Please elaborate.

and this is still a draft anyway.

RFC 7807bis is a draft, RFC 7807 isn't.

sazzer commented 3 years ago

Meanwhile, no developer wants to be given a link to documentation and then hit a 404. If tools want to present type URIs to developers as helpful links to human-readable documentation, they first need to know whether it is actually a link to a resource.

There's absolutely no guarantee that can be given by anyone, anywhere, at any time, that any URI will resolve to anything at all. All URIs can lead to 404, no matter what their context and semantics are and no matter how many different avenues to provide links you give. Given enough time, most URIs end up pointing to a domain squatted by a shark. Is this of concern to RFC 7807bis? I believe it isn't.

There's also the fact that not all URIs are resolvable for all clients. It's entirely possible that the URI resolves correctly for the server developer, but some client is behind some firewall that doesn't allow access to it.

Or it might be temporarily down. Or a myriad of other reasons why a reasonable URI can't be resolved right now.

Another alternative is to add a slash to make the URI absolute: "type": "/out-of-credit". Now you have a type that will be considered equal no matter how you compare it against a type given in a response from the same authority.

That's still relative. If I understand things correctly, the path segment is now absolute but the schema/host/port/etc isn't specified so become relative to the original request. If you got that type from different APIs on different hosts then the resolved absolute URI is still different.

As discussed in #7, there's an existing desire to reuse type URIs across different APIs, something which is impossible with opaque strings since you have no idea whether invalid-draft coming from three different APIs means:

It's perfectly possible with opaque strings. It's not possible with relative URIs iff the resulting value is first expanded to be an absolute URI - and thus might have a different value based on the scheme/hostname/port of the API being called. If the value was always an absolute URI or a relative URI that didn't get expanded then this problem goes away. Though at that point you've just got opaque strings that might or might not bear some resemblance to a URI.

asbjornu commented 3 years ago

Another alternative is to add a slash to make the URI absolute: "type": "/out-of-credit". Now you have a type that will be considered equal no matter how you compare it against a type given in a response from the same authority.

That's still relative. If I understand things correctly, the path segment is now absolute but the schema/host/port/etc isn't specified so become relative to the original request. If you got that type from different APIs on different hosts then the resolved absolute URI is still different.

Yes, that's why I wrote "from the same authority".

As discussed in #7, there's an existing desire to reuse type URIs across different APIs, something which is impossible with opaque strings since you have no idea whether invalid-draft coming from three different APIs means:

It's perfectly possible with opaque strings.

Not unless these strings are registered and clearly defined in a global registry.

It's not possible with relative URIs iff the resulting value is first expanded to be an absolute URI - and thus might have a different value based on the scheme/hostname/port of the API being called.

I agree that's just as useless as opaque, non-URI strings.

pimterry commented 3 years ago

There's absolutely no guarantee that can be given by anyone, anywhere, at any time, that any URI will resolve to anything at all.

It's entirely possible that the URI resolves correctly for the server developer, but some client is behind some firewall that doesn't allow access to it.

Ok, granted, you can't get a perfect guarantee. There is both a semantic and practical difference though between "here is a link that I intend you to dereference to find more information" and "here is a URI that is not intended to be dereferenced, it's just a type id".

We currently allow both, intentionally, with no distinguishable difference until you dereference it. When looking for usable links to more information, any tooling is interested only in the former, where the URL is very likely to be useful (granted, not guaranteed). Meanwhile if you ever accidentally use the latter it's extremely likely to provide a bad user experience.

It's very easy to differentiate these cases, and it would be useful, so we should.

If you got that type from different APIs on different hosts then the resolved absolute URI is still different.

Yes, that's why I wrote "from the same authority".

Unfortunately that's the same problem: client implementations do not take the authority into account either, they never use the base URL. I agree that they should, but right now it looks like most implementations of this specification handle both relative & absolute paths incorrectly, and only correctly use fully specified absolute URIs.

Semantical ambiguities like these aren't insignificant and URIs solve them elegantly and efficiently.

I agree that URIs are excellent! I don't object to the use of URIs, I object to requiring that all APIs errors use URIs in all cases, as we do right now. There's lots of feedback (see above) that others feel similarly, and would like to use this spec without URIs.

As an interesting supporting point: the WHATWG URL specification specifically removed all references to URIs because "URI and IRI are just confusing". URIs are not well understood by many developers, and that will limit adoption of this spec.

Right now, as in the blog post above, some users avoid using type entirely because it's confusing and they don't want to create pages for every error type. This creates more semantic ambiguity than the status quo!

I think there's some good routes through here with minimal impact [on backward compat]

Please elaborate.

Ok, one option would be:

Make type a freeform string where every response from the same API with that code represents the same type of error. Unspecified still defaults to about:blank. Just one constraint: if type is an absolute URI, it must either be a URI already reserved for that purpose or a URI under the control of the API using it (e.g. HTTP/tag on your domain).
- This has no compatibility impact for existing APIs using absolute type URIs. Types in their existing responses are still valid and retain extremely similar rules for error type matching (technically there could be differences, e.g. # fragments in URL shouldn't be considered in URL comparisons, but I would be astonished if that is in use and working correctly for error handling in any API today).
- No compatibility impact for clients receiving existing responses that are absolute URIs or absolute paths, for similar reasons.
- Semantics change slightly for clients & APIs using path-relative URI types. Today these should be assumed to be distinct types per endpoint, and they would instead be treated as API-global.
  - Helpfully, it appears that most implementations match this proposed behaviour today already, rather than the specified behaviour.
  - In addition, impact of incompatibility is low: clients would incorrectly treat error types as distinct unrecognized errors, and would fail in slightly less granular ways.
- Clients who parse type as a URI have potential compatibility issues parsing non-URI responses. This risk is low because:
  - This doesn't affect or invalidate any existing client & API exchange that works today. It only affects clients newly making requests to APIs who start using some new URI-unparseable error type.
  - URI syntax is very permissive, so most plain strings that would be used are also valid relative URIs, e.g invalid-draft works just fine (though, as mentioned, its semantics change slightly as it's currently a relative URI). It's very difficult to come up with examples of plausible error codes that would fail to parse as URIs. We could tighten the allowed characters if this is a concern.
If a type is an absolute URI defined in a URI registry of common types or otherwise recognized by the client, the error SHOULD be treated as an instance of the globally registered error types (i.e. #7)
- Fully compatible, and we retain the ability to share types across different services
- By putting about:blank into this URI registry, we retain the same semantics for that
APIs are encouraged to use absolute URIs as types, for namespacing and type-sharing benefits, but this becomes optional.
APIs who have documentation relevant to the error response should include a link header.
- I'd suggest we mint a new relation problem-doc. This would pair very nicely with the existing service-doc relation. Service-doc provides general documentation for the resource, problem-doc provides documentation for the type of error returned.

That resolves all the points above, with minimal backward compatibility impact. It still supports all the benefits that absolute URIs give us for those who want that, but also supports other cases too.

Would that work for you? Are there specific compatibility concerns in here that would need further mitigation?

sazzer commented 3 years ago

Ok, granted, you can't get a perfect guarantee. There is both a semantic and practical difference though between "here is a link that I intend you to dereference to find more information" and "here is a URI that is not intended to be dereferenced, it's just a type id".

We currently allow both, intentionally, with no distinguishable difference until you dereference it. When looking for usable links to more information, any tooling is interested only in the former, where the URL is very likely to be useful (granted, not guaranteed). Meanwhile if you ever accidentally use the latter it's extremely likely to provide a bad user experience.

I've been working with XML for many years, where XML namespaces are URIs that almost never dereference to anything. In fact, there are normally specific mechanisms in XML tooling to know where the XSD for a given namespace is simply because the namespace URI doesn't dereference.

And it works fine. People use it, and all is good. So tooling only being interested in dereferenceable links isn't strictly accurate.

Ok, one option would be: [....]

In terms of your proposal. My one concern with it is that I suspect many developers will opt for the choice of just using error codes instead of URIs, because they find it easier. That then means that you lose namespacing, you lose the ability for different APIs to share the same error types, and probably other things.

There's also the minor concern that strictly speaking you can't distinguish a URI from an arbitrary string that happens to have the same format. Somebody might write a string that just happens to contain :// in the middle of it. Is that a URI? Or is it a coincidence? That's very much an edge case though and almost certainly can be ignored :)

The impact on existing clients is also a concern, if it means that suddenly they can't understand errors that they could before. That's only an issue for clients that are correctly handling relative URIs though, and I don't know how many of those there are - if any.

pimterry commented 3 years ago

So tooling only being interested in dereferenceable links isn't strictly accurate.

True. I'm really talking about a specific subset of tooling, I should be more specific. I'm thinking about tooling that might format an error response for human consumption, like debuggers, manual clients like Postman & curl, loggers, and other error reporting tools.

I think those are tools that could get a lot of value from this spec, and in many of these cases it's useful to know if the URL is documentation intended for human consumption.

There's also the minor concern that strictly speaking you can't distinguish a URI from an arbitrary string that happens to have the same format. Somebody might write a string that just happens to contain :// in the middle of it. Is that a URI? Or is it a coincidence? That's very much an edge case though and almost certainly can be ignored :)

It's slightly more general unfortunately, since some absolute URIs only contain : e.g. tag:example.com,2021:my-error.

If that's a concern, I think saying "error types must not contain : unless they're an absolute URI" would be a reasonable constraint, and would support most error codes you see in the real world. Explicitly specifying that would help parsers to quickly differentiate the two cases too.

My one concern with it is that I suspect many developers will opt for the choice of just using error codes instead of URIs, because they find it easier.

That is quite possible. It appears that there's also a group of people who currently avoid the spec altogether though, or use it suboptimally, because of the URI requirement. Standard formatting of error messages seems very valuable even without URIs.

Personally I'm OK with this. Error types intended to be reused elsewhere can use URIs, and error types not intended for global reuse don't need URIs. The challenging case is when you want to reuse a URI defined by somebody else who didn't explicitly make it reusable. I'm OK with not supporting that, but I see how others might find this useful.

The impact on existing clients is also a concern, if it means that suddenly they can't understand errors that they could before. That's only an issue for clients that are correctly handling relative URIs though, and I don't know how many of those there are - if any.

Yep. I doubt it's used much though, and I suspect this change would actually increase the number of implementations that matched the spec when using relative URIs. I'd be really interested to know if anybody can find an example of a relative type URI in production anywhere though.

asbjornu commented 3 years ago

There is both a semantic and practical difference though between "here is a link that I intend you to dereference to find more information" and "here is a URI that is not intended to be dereferenced, it's just a type id".

But RFC 7807 does not state "here is a URI that is not intended to be dereferenced, it's just a type id". It states that type should not be automatically dereferenced. Dereferencing type based on a user's action, perhaps by clicking "view documentation" in a UI is perfectly fine and should probably be encouraged for documentation purposes in the specification. rfc7808bis should seek to clarify that.

If the link doesn't resolve, the UI would have to recover from that failure somehow regardless of the semantics and guarantees that were given regarding the link that was dereferenced. "Sorry, no documentation for example.com/out-of-credit is available" is a fine error message for type just as it is described by the specification as well as for a new documentation URI that as such don't provide anything valuable type doesn't already provide.

We currently allow both, intentionally, with no distinguishable difference until you dereference it. When looking for usable links to more information, any tooling is interested only in the former, where the URL is very likely to be useful (granted, not guaranteed). Meanwhile if you ever accidentally use the latter it's extremely likely to provide a bad user experience.

Why would the user experience be bad? Please elaborate.

Unfortunately that's the same problem: client implementations do not take the authority into account either, they never use the base URL.

Then they don't have the problem described, since they will just compare /out-of-credit with /out-of-credit across all responses, regardless of the context (base) URI.

I agree that they should, but right now it looks like most implementations of this specification handle both relative & absolute paths incorrectly, and only correctly use fully specified absolute URIs.

For clients that don't treat type as a URI, /out-of-credit in response A is just an opaque string, indiscernible from /out-of-credit in response B, even though A and B may be from entirely different authorities altogether (although within the same API, I assume).

As an interesting supporting point: the WHATWG URL specification specifically removed all references to URIs because "URI and IRI are just confusing".

I think you meant "URL" and not "URI" here, but regardless I agree with WHATWG in that both "URI" and "IRI" are confusing and don't add much to the more prevalent "URL". I'm not sure IETF is ready to replace its usage of "URI" to "URL" yet, though. That would probably require rfc3986bis of some sort.

URIs are not well understood by many developers, and that will limit adoption of this spec.

I agree URIs are not well understood. But I think that is a problem and believe the solution is not to cave in and stop using URIs, but to use even more URIs.

URI ALL THE THINGS

Right now, as in the blog post above, some users avoid using type entirely because it's confusing

Then let's invest in making it less confusing in rfc7807bis! 😃

and they don't want to create pages for every error type.

They don't have to. Let's be more explicit about that. More examples could perhaps help.

Ok, one option would be: […] Would that work for you?

Sorry, no. I'll just +1 @sazzer's criticism here. I don't think this is going to work. In hindsight, it may have been better if non-URI values were interpreted not as relative URIs, but as implicit URNs or something similar. But that train sailed with a boat on water under the bridge more than 5 years ago.

Are there specific compatibility concerns in here that would need further mitigation?

Yes, your suggestion is going to break clients with the following expectations:

type URIs that point to external documentation.
Non-absolute type URIs are relative to the context.

So tooling only being interested in dereferenceable links isn't strictly accurate.

True. I'm really talking about a specific subset of tooling, I should be more specific. I'm thinking about tooling that might format an error response for human consumption, like debuggers, manual clients like Postman & curl, loggers, and other error reporting tools.

Isn't it possible for these tools to only present documentation upon a user's explicit action?

I think those are tools that could get a lot of value from this spec, and in many of these cases it's useful to know if the URL is documentation intended for human consumption.

If type is dereferenceable, the content is intended for human consumption.

It appears that there's also a group of people who currently avoid the spec altogether though, or use it suboptimally, because of the URI requirement. Standard formatting of error messages seems very valuable even without URIs.

Perhaps we could survey these people on the reason they're not using RFC 7807 instead of just speculating?

Yep. I doubt it's used much though, and I suspect this change would actually increase the number of implementations that matched the spec when using relative URIs. I'd be really interested to know if anybody can find an example of a relative type URI in production anywhere though.

I believe your suspicion may be correct. I would love to see the result of a survey of existing implementations before concluding, but I'm currently +0 to make the compromise of making non-URI type implicit URNs or some other similar solution that doesn't convert them into context-relative HTTP(S) URIs.

asbjornu commented 3 years ago

To expand on the "making non-URI type implicit URNs", this sort of implicit expansion has precedent in RFC 4287 section 4.2.7.2's definition of rel:

The value of "rel" MUST be a string that is non-empty and matches either the "isegment-nz-nc" or the "IRI" production in [RFC3987]. Note that use of a relative reference other than a simple name is not allowed. If a name is given, implementations MUST consider the link relation type equivalent to the same name registered within the IANA Registry of Link Relations (Section 7), and thus to the IRI that would be obtained by appending the value of the rel attribute to the string "http://www.iana.org/assignments/relation/".

In RFC 5988 section 4.1, relvalues are either required to be registered in IANA's Link Relation Registry, or in the case of Extension Relation Types, full URIs. RFC 7807's deviation from these two mechanisms in making type relative to the context is perhaps the source of the greatest issue here.

I've come to agree that "type": "out-of-funds" should be comparable across problem documents disregard of their request context.

mnot commented 3 years ago

Hey @asbjornu - that's interesting, but it's backwards-incompatible, so we'd need to use a new media type. Is that worth it (considering the resulting confusion, etc.)?

asbjornu commented 3 years ago

Sorry for being imprecise. What I meant was that I agree with (part of) the problem description. I'm +0 on actually doing anything about it. If enough weight is put behind a new media type to break with the current implementations of type, then I would not oppose it.

sazzer commented 3 years ago

The obvious concern with a new media type is that implementers will either need to migrate completely over to it, or else end up supporting two different ways of representing problems. That feels at odds with what - to me - is one of the major benefits of RFC-7807 in the first place, that there is one standard way to do this and you'd need a good reason to do something different.

dret commented 3 years ago

On 2021-02-10 10:52, Graham Cox wrote:

The obvious concern with a new media type is that implementers will either need to migrate completely over to it, or else end up supporting two different ways of representing problems. That feels at odds with what - to me - is one of the major benefits of RFC-7807 in the first place, that there is one standard way to do this and you'd need a good reason to do something different.

the discussion around using URIs as identifiers has been had what feels like a million times. everybody knows the implications. it's a design trade-off.

given that there is no clear "winner" i think not breaking the existing media type should make it a big preference to stick with what we have.

i am more than willing to make proposals how to improve the language so that people reading the spec get a bit more context and assistance. but as we all know, specs are not read as much as one might wish, so we'll still see broken implementations out there. that's just life.

pimterry commented 3 years ago

Whilst it would still be technically incompatible, I think the practical outcome from my proposed changes would be that:

The incompatibility only affects fully relative URIs (e.g. out-of-funds). Both absolute URIs and absolute paths would retain effectively identical semantics to the current spec.
That incompatible case is not used or discussed in the wild anywhere that I've ever seen (counter examples very welcome!)
In generic implementations, the incompatible case appears to be incorrectly implemented everywhere, with every implementation I've seen already matching the proposed rather than specified behaviour for this case. It seems likely that this breaking change would dramatically increase the number of implementations correctly implementing the specification today.

Does that affect the calculus on breaking changes here?

(I do agree that @asbjornu's proposal to make relative error types global by default has much larger compatibility implications, unfortunately, and that minting a new media type has repercussions that make this unlikely to be worthwhile in either case)

sazzer commented 3 years ago

Both absolute URIs and absolute paths would retain effectively identical semantics to the current spec.

Not quite. Absolute URIs retain identical semantics. Absolute paths only retain identical semantics if they are from the same scheme and authority. If they come from different authorities then the resolved URI from those absolute paths is different, and thus the value to use under the current spec is different.

However, as you say, nobody does that. At least, nobody that I've ever seen. I don't know how we'd find out if introducing a breaking change is actually going to break things in any real code, but it seems to me that this proposed change is the cleanest route and that the risk of breaking things is mitigated by the fact that (almost) everybody does it wrong now anyway.

The other option would be to introduce not a new media type, but a new parameter to the existing media type. So it becomes application/problem+json; v=2. Anyone who doesn't specify a value for v would then be assumed to be using the most recent version of the spec, which is what we're discussing here. If they explicitly want to retain the old behaviour then they would specify v=1. (That would also open the door for further future changes that introduce a v=3 as well.)

I will say, I'm not hugely thrilled by that idea. I prefer trying to keep things simple and not need to do any versioning like that. But it is an option.

pimterry commented 3 years ago

Absolute paths only retain identical semantics if they are from the same scheme and authority. If they come from different authorities then the resolved URI from those absolute paths is different, and thus the value to use under the current spec is different.

In the proposed version, both relative & absolute paths would be considered as scoped to the API (i.e. the scheme & authority). So out-of-funds should be treated as same type of error everywhere in the API (unlike today) and as distinct types if returned from different APIs (same as today).

/problem/out-of-funds doesn't change though: it should be treated as the same type everywhere in the same API (the same as today) and as distinct types if returned by different APIs (the same as today).

Does that not match today's semantics? Maybe I'm missing something, do you have an example where the behaviour changes?

mnot commented 3 years ago

So it seems like we have some agreement that we don't want to introduce a new mime type, and as a result we shouldn't change the nature of type in a backwards-incompatible fashion.

Proposal: we should resolve this issue by focusing on making the spec communicate more clearly about how to use URIs successfully in type, possibly adding new fields, etc. in separate issues. Once we do that, we should close this issue, but label it revisit-on-breaking-change so that if for some reason we find a compelling enough reason to mint a new mime type, we remember to reconsider.

pimterry commented 3 years ago

Personally I'd still prefer the technically-incompatible-but-probably-more-compatible proposal with no mime type change at all, but I can understand if that's not an acceptable risk.

If we're talking about guaranteed-safe changes we can make, the things I'd personally find useful are:

Making a clear & simple recommendation in the spec for users who have a simple error code and who don't want to link the error type to a documentation page. E.g. they should use tags, or URNs, or absolute paths, or something else.
Discouraging fully relative paths, since they're very counter-intuitive, most/all implementations handle them wrong, and we may want to change their semantics in a future breaking change.
Providing some way for clients to know if a type URL is designed to be dereferenced as a link to documentation, or if it's simply acting as an identifier.

I'll file new issues for those, and we can continue discussion on them independently there. Happy for this issue to be tagged and left for a future breaking change in the meantime if that's all we can do for now.

mnot commented 3 years ago

See PR #20. Thoughts?

mnot commented 3 years ago

Am merging #20, which will close this. If anyone has further input, please open a new issue or comment.