Take encoding into account when parsing link headers & early hints

whatwg / html

HTML Standard

https://html.spec.whatwg.org/multipage/

Other

8.1k stars 2.67k forks source link

Take encoding into account when parsing link headers & early hints #9715

Open noamr opened 1 year ago

noamr commented 1 year ago

See https://github.com/whatwg/html/pull/9709#discussion_r1320653782

Usually we use the document's encoding when parsing URLs in link headers, but that doesn't exist yet for early hints & link headers, so we need to use something, probably the charset param of the document's content-type header. /cc @bashi

domenic commented 1 year ago

I think using UTF-8 would be better, ignoring the Content-Type. Especially because Content-Type might not arrive by early hints time, right?

noamr commented 1 year ago

I think using UTF-8 would be better, ignoring the Content-Type. Especially because Content-Type might not arrive by early hints time, right?

Right. I think it's a matter of calling steps 3-6 of https://html.spec.whatwg.org/#parse-a-url instead of running the whole algorithm.

bashi commented 1 year ago

+1 to use UTF-8. Early hints are introduced recently so I guess it's not so harmful to assume servers that speak early hints use UTF-8.

domenic commented 1 year ago

It would be good to write tests to see what browsers do for non-early Link headers. Do they use Content-Type, or do they always use UTF-8?

I hope that at least some browsers always use UTF-8, and so we can have the simple rule "if it's a Link header, we use UTF-8; if it's <link>, we use the document's encoding".