Closed viliam-durina closed 3 months ago
Looking at the RFC more deeply, looks that I'm missing a lot. For example, missing >
or missing final quote "
MUST be ignored. There's nothing about URL-decoding the href
, at least I can't find it. Seems that the particular service I'm working with now got this incorrectly since they are double-escaping their URL, contrary to the standard. There also can be multiple links within a single header, separated by ,
, if I understand it correctly.
But many of the points above still apply. The RFC contains precise algorithm for parsing, I would expect a high quality library to strictly follow it, including the edge cases.
Would you mind listing the cases that really do break and are valid link headers according to the specification?
@odrotbohm I edited the original post of this issue. The list isn't complete, to implement correctly, one should follow the specification very closely. There's exact algorithm for the parsing an Appendix B, from which handling of all (most?) edge cases follows (e.g. the ignoring of the missing final quote, if I understand it correctly).
Would you mind turning them into test cases and submitting a PR? I'll see what I can do then. If you fancy taking a stab at a better parsing implementation, feel free to submit that, too.
This is fixed by merging and polishing your PR. Also back-ported to 2.3.x and 2.2.x.
How you renamed the issue made my day: so it wasn't a bug, but a new, advanced feature?? I renamed it back, unfortunatelly, I can't rename the commit...
We needed to parse a Link. My junior colleague wanted to do a quick-and-dirty parser. I told to him that there's an RFC, rather look for a lib because you won't make it correctly. We found this lib, but realized that it had exactly the same issues I worried a junior would make. And now you renamed it to "support for advanced expressions". Is a semicolon in a String literal advanced expression? Just tell me, would you really say so?
We've decided to use your library to parse
Link
headers, hoping that you'll correctly implement the specification intricacies, because doing parsing correctly is tricky. However, it's implemented completely naively, working only in the basic cases. It uses regex for parsing, even though it can't be used for non-context-free grammars.Here are a bunch of examples:
This is the ABNF for the
Link
header:And here for the content of the
rel
param:See https://httpwg.org/specs/rfc8288.html
So there can be multiple values in the
rel
param, even URIs and quoted URIs.Some of the errors can't be fixed without breaking b-w compatibility, e.g. the decoding of href, unescaping of params and multiple rel values. Also, I'm not sure why custom parameters aren't supported - I didn't read the whole spec, but I don't think it prohibits custom params (we don't use them, just wondering).