`Problem:` HTTP header field and localization

benbucksch commented 1 year ago

The spec currently states: The title and detail values MUST NOT be serialized in the Problem field if they contain characters that are not allowed by String; see {{Section 3.3.3 of STRUCTURED-FIELDS}}. Practically, this has the effect of limiting them to ASCII strings.

I understand that the limitation to ASCII is imposed by prior existing specs. However, mandating error strings to be in English (which is what ASCII means in practice) is not very helpful for the purposes of this specification. Importance: Errors strings need to be helpful for the user and allow them to correct the problem. If they are in a language that the user does not understand, they are unlikely to be helpful. This in turn leaves the end user frustrated. Messages that are potentially important for the end user MUST be translated into the user's language.

You could specify an escaping mechanism which allows for escaping of Unicode characters to be expressed with ASCII, e.g. a JSON string (RFC 4627 Point 2.5, Paragr. 2, "\u005C"). Recommendation: The Problem: HTTP header field is a JSON string. Non-ASCII Unicode characters MUST be escaped, because HTTP Headers require ASCII. (This would probably need a lot more tweaking.)

If it is not reasonably possible to convey in the HTTP header field, I would recommend to remove this transmission method entirely.

reschke commented 1 year ago

Repeating what I said in email somewhere else: this invents yet another way to express non-ASCII in structured fields; I believe HTTPAPI and HTTP working groups should come up with a common solution that does not require out-of-band informatio about the format.

The alternative would be to remove the "title" parameter altogether; a common argument against support of non-ASCII characters in structured fields is that header fields are not a good place for human-readable information.

darrelmiller commented 1 year ago

It does feel weird for RFC 8941 to say

When it is necessary for a field value to convey non-ASCII content, a Byte Sequence (Section 3.3.5) can be specified, along with a character encoding (preferably UTF-8 [STD63]).

and then have RFC7807bis provide different guidance. Is it common for SF implementations to provide functions to encode strings with non-ASCII characters into strings with Byte Sequences? If so, wouldn't it be more natural to use the SF implementation do the encoding.

reschke commented 1 year ago

Indeed. I think we (that is the HTTP WG) should define a common approach that does not require out-of-band information (from the perspective of a Sf parser/serializer).

dret commented 1 year ago

hello.

On 2023-01-22 17:48, Julian Reschke wrote:

Indeed. I think we should define a common approach that does not require out-of-band information (from the perspective of a Sf parser/serializer).

it would be fantastic if we could all agree on SF not being the next maslow hammer. it's a good idea, but it also is something that apart from the usual suspects (insert your favorite CDN name here) won't be such a natural thing to support for the foreseeable future.

i am a bit worried because i have seen SF very enthusiastically being sold as "this is how things are now, better adjust to it". it's a good idea, but we also have to get used to the idea that there's tooling out there that for the next few years (at the very least) won't be able to cope with it. reality is annoyingly complex if we step away from the views of large players.

so as much as i like the idea of a foundation that HTTP header fields could/should build on going forward, it seems to me that the view of how much backwards compatibility we need is a bit skewed.

in the HTTP API working group we have pretty much being blocked when the clear consensus (as counted by people saying "looks good to me") was overwritten by people playing the spec card (SF is the new thing, nothing else can be done). it's a delicate balance to get right, but it doesn't feel like we're getting it right so far.

cheers,

dret.

-- Erik Wilde | @.*** | | https://youtube.com/ErikWilde |

reschke commented 1 year ago

FTR, I disagree with that view of things. The lack of standardized parsing for field values has been a major pain in the past. So something like SF is definively we need to use and improve.

dret commented 1 year ago

On 2023-01-23 11:43, Julian Reschke wrote:

FTR, I disagree with that view of things. The lack of standardized parsing for field values has been a major pain in the past. So something like SF is definively we need to use /and/ improve.

yes, this needs to be used and improved. but we also have to embrace the fact that for many years going forward, we will have software out there that doesn't support SF, that's just the reality of APIs.

the API space is structurally very different from the browser space. instead of having very few CDN players being intermediaries, there are many different API management products in place in many different places, and many of them cannot and will not be easily updated to deal with SF.

so yes, let's move things towards SF. but let's also be mindful that we shouldn't assume that everybody will support SF very soon. that's just not going to happen, no matter how nice it would be if it did.

-- Erik Wilde | @.*** | | https://youtube.com/ErikWilde |

benbucksch commented 1 year ago

in the HTTP API working group we have pretty much being blocked when the clear consensus (as counted by people saying "looks good to me") was overwritten by people playing the spec card (SF is the new thing, nothing else can be done).

I concur with Erik / dret here. I am surprised to hear that SF was chosen against the consensus. L10n is just one of the problems.

To get back to the point here: Restricting field values (not field names) to ASCII was bad in 1980. In 2023, it's just missing reality. Localized error messages are mandatory since 1992 or so. So, either SF can support Unicode, or it's not fit.

mnot commented 1 year ago

CC @tfpauly as HTTP WG chair who's in charge here.

The HTTP Working Group has discussed that extensively and come to a conclusion; it's not helpful to restart the discussion in a separate Working Group and not even link to that.

See: https://github.com/httpwg/http-extensions/issues/2343

Regarding the suggestion -- removing title would require removing detail as well. That would reduce some of the utility of the field; not sure how much.

Alternatively, we could remove the field altogether; it was added somewhat speculatively, and if it's causing this much trouble, perhaps it was premature.

reschke commented 1 year ago

I'm in favor of removing the field for now if it helps us getting this revision out of the door.

asbjornu commented 1 year ago

I agree. Let's remove the header field and revisit once we have ironed out all of the issues that have been highlighted during this discussion.

sdatspun2 commented 1 year ago

I understand that the limitation to ASCII is imposed by prior existing specs. However, mandating error strings to be in English (which is what ASCII means in practice) is not very helpful for the purposes of this specification. Importance: Errors strings need to be helpful for the user and allow them to correct the problem. If they are in a language that the user does not understand, they are unlikely to be helpful. This in turn leaves the end user frustrated. Messages that are potentially important for the end user MUST be translated into the user's language.

Should the Problem Details recognize (not ignore) Accept-Language?

benbucksch commented 1 year ago

Should the Problem Details recognize (not ignore) Accept-Language?

Yes, the current (adapted) draft says this explicitly: "the language used for human-readable strings (such as those in title and description) can be negotiated using the Accept-Language request header field"

sdatspun2 commented 1 year ago

Cool, I missed all the fun discussion but glad to know it is incorporated. I am switching to whole repo notifications for this repo and 2 more so I don't miss such.

mnot commented 1 year ago

OK, let's remove the field -- we can always reintroduce it separately.

awwright commented 1 year ago

OK, let's remove the field

If this is the only thing holding publication back then I suppose so... but at the same time, I'm disappointed it would come to this; the header is the part I'm most excited about.

I think adding an encoding step to sf-string negates most of the advantages of Structured Fields. I don't see anyone implementing that step correctly. I still haven't implemented it.

But there is another format that has already solved this problem... In issue #56 I suggested that we re-use the syntax of the Link header, e.g.:

Problem: <http://example.com/api/enhance-your-calm>;
       title*=UTF-8'ja'%E8%90%BD%E3%81%A1%E7%9D%80%E3%81%8D%E3%82%92%E9%AB%98%E3%82%81%E3%81%A6

I understand the other advantage of Structured Fields is supposed to be that the types are self-descriptive (like in JSON, where you don't need a schema to distinguish a boolean from a number, the encodings of both are completely disjoint)... But if we're going to add processing steps for some strings but not others, that frustrates this goal: Some "strings" have to be decoded, and some don't, and you have to know the "schema" of the field to know the correct behavior. What would the benefit of structured fields be, then?

And we have to apply validation to the end result anyways, so in the end I don't see any advantage over defining an ABNF.

FTR, I disagree with that view of things. The lack of standardized parsing for field values has been a major pain in the past. So something like SF is definively we need to use and improve.

But the parsing of field values is standardized, though. There is exactly one correct way to parse a string according to an ABNF. Not only that, but an ABNF is provable... it's possible to look at an implementation, and generate an ABNF (or sometimes regular expression) of all of the strings that the implementation accepts that it should reject, and vice-versa. (In compliant implementations this result will be empty.) It is some complicated math, but it is straightforward.

If implementations aren't implementing the ABNF correctly, I'm not sure why I should trust Structured Fields to be any better. I think most of the advantages in practice can be attributed to the test suites and stricter error handling/interoperability requirements.

reschke commented 1 year ago

I think most of the advantages in practice can be attributed to the test suites and stricter error handling/interoperability requirements.

Yes. So how do you achieve that when your proposal is to re-use an existing Link header field parser?

awwright commented 1 year ago

So how do you achieve that when your proposal is to re-use an existing Link header field parser?

You'd achieve it by publishing test cases for the ABNF.

I'm not sure what additional interoperability requirements would look like exactly, except an option to report a syntax error.

If I drew up a comprehensive list of tests, would that be persuasive at all?

reschke commented 1 year ago

What would be the benefit over SF?

mnot commented 1 year ago

Reusing that syntax is a non-starter; doing so would further fragment the landscape.

awwright commented 1 year ago

@reschke I listed the benefits above... In particular, the Link syntax natively supports internationalized parameters, and I gave one example of what that would look like.

There's also benefits in general... SF will bring more cases of problems resembling "impedance mismatches" and headers that require context-based parsing, which a pure ABNF solution would avoid from the beginning. Headers defined with an ABNF can be compressed more compactly than SF.

Some of my expertise is in parsing JSON, in languages for defining grammars like ABNF, and the overlap of the two (i.e. JSON Schema). If "SF Schema" is a phrase that makes you shudder, then I would like to share some research you'd probably be interested in.

@mnot Could you briefly explain what you mean? I suggested the Link syntax precisely to avoid fragmentation (it's a well-established syntax).

reschke commented 1 year ago

@reschke I listed the benefits https://github.com/ietf-wg-httpapi/rfc7807bis/issues/67#issuecomment-1403213007... In particular, the Link syntax natively supports internationalized parameters, and I gave one example of what that would look like.

I'm very aware of that, look at the author list for RFC 8187. The same syntax could be used with SF, for what it's worth (if the string has a key). There's a reason why key names in SF support "*" (https://httpwg.org/specs/rfc8941.html#param).

There's also benefits in general... SF will bring more cases of problems resembling "impedance mismatches" and headers that require context-based parsing, which a pure ABNF solution would avoid from the beginning. Headers defined with an ABNF can be compressed more compactly than SF.

I have really no idea what you're talking about here.

@mnot Could you briefly explain what you mean? I suggested the Link syntax precisely to avoid fragmentation (it's a well-established syntax).

It's well established, but has poor parsers.

awwright commented 1 year ago

I have really no idea what you're talking about here.

I'm suggesting that there's many drawbacks to SF, but I'm not sure what you're looking for, or what you would find persuasive, and I'd have to go into some depth to explain.

As two examples, I'm saying we will run into endless problems just like this, where there's not a clean way to represent some aspects of HTTP semantics, actually making things more complicated instead of less. And since one of the selling points of SF is the idea of a binary packing, it turns out SF cannot be compressed better than an ABNF that does the same functions, and you can show this mathematically.

I'm not going to be able to make a compelling point in one comment... but if I wrote up a comprehensive email to the list detailing my findings, would you review it?

It's well established, but has poor parsers.

This is why I was asking about test suites... If I showed that the parsers could be improved to the same level as SF, would that be persuasive?

reschke commented 1 year ago

Yes, please do that on the HTTP WG's mailing list. IMHO we should focus on fixing issues in SF, not inventing something new once again.

FWIW, letting people "parse" field values based on substring matches or broken regexps is the problem we need to solve. Everytime we define a new syntax, we make things worse.

ietf-wg-httpapi / rfc7807bis

`Problem:` HTTP header field and localization #67