Moreover (feel free to factor out into a separate issue): Regarding the list of characters that must be percent-encoded, I wonder if it has been considered to include ( and ) here. Their presence may not strictly make the output ambiguous, but does make it more complex to parse.
For example, I might quote (but not crazy) in the text this is an artificial (but not crazy) example with a RangeSelector: (line breaks inserted for readability)
Note that the total of three closing parentheses after crazy. I think one cannot decide how many of these are part of the cited string (one), and how many are part of the selector(…) syntax (two), without the parser either backtracking or keeping track of its recursion depth.
The proof of concept converter tool is based on PEG.js, which does not support backtracking, so if I am not mistaken cannot parse this. In fact, that tool does not allow parentheses in the values at all — see the last line of the source, where it defines validchar as any of a-zA-Z0-9<>/[]:%+@.-!$&;*_ (is this list based on a particular spec?).
In the note about selectors and states, section 5:
The referenced RFC3986 defines the following grammar for a fragment identifier:
In some of the note’s examples, some characters are not percent-encoded that are not actually valid in a fragment identifier.
Square brackets
[
]
are are found in e.g. example 18:Angular brackets
<
>
(which delimit URIs, so cannot be used inside them) are found in e.g. example 17:May this be worth a thorough review?
Parentheses
Moreover (feel free to factor out into a separate issue): Regarding the list of characters that must be percent-encoded, I wonder if it has been considered to include
(
and)
here. Their presence may not strictly make the output ambiguous, but does make it more complex to parse.For example, I might quote
(but not crazy)
in the textthis is an artificial (but not crazy) example
with a RangeSelector: (line breaks inserted for readability)Note that the total of three closing parentheses after
crazy
. I think one cannot decide how many of these are part of the cited string (one), and how many are part of theselector(…)
syntax (two), without the parser either backtracking or keeping track of its recursion depth.The proof of concept converter tool is based on PEG.js, which does not support backtracking, so if I am not mistaken cannot parse this. In fact, that tool does not allow parentheses in the values at all — see the last line of the source, where it defines
validchar
as any ofa-zA-Z0-9<>/[]:%+@.-!$&;*_
(is this list based on a particular spec?).