the-moisrex / webpp

C++ web framework | web development can be done with C++ as well.
https://t.me/webpp
MIT License
126 stars 9 forks source link

URI Code Point handling #512

Open the-moisrex opened 5 months ago

the-moisrex commented 5 months ago

WHATWG is written with UTF-8 Code Point in mind, and URI Code Point is not just ascii characters, but also UTF-8 characters.

This mean the way we "find" characters and check/encode/... them only works for ascii characters.

For example, Opaque Path requires checking the code point too, so that mean the \t in \t foo:bar would be an invalid character warning and it won't be encoded, it would be ignored.


URI Code Point Definition

The URL code points are ASCII alphanumeric, U+0021 (!), U+0024 ($), U+0026 (&), U+0027 ('), U+0028 LEFT PARENTHESIS, U+0029 RIGHT PARENTHESIS, U+002A (*), U+002B (+), U+002C (,), U+002D (-), U+002E (.), U+002F (/), U+003A (:), U+003B (;), U+003D (=), U+003F (?), U+0040 (@), U+005F (_), U+007E (~), and code points in the range U+00A0 to U+10FFFD, inclusive, excluding surrogates and noncharacters.