Closed eemeli closed 4 months ago
I wish you'd added this as a separate alternative.
I don't like that the isolates are part of the name
rule---I worked hard to keep the isolates outside the rules for important constructs (like name
)
You removed unquoted literals from being amenable to bidi isolation, but they should still be isolatable, no?
I don't like that the isolates are part of the
name
rule---I worked hard to keep the isolates outside the rules for important constructs (likename
)
Including the isolates in name
doesn't change its parsed meaning, much like the |
aren't a part of the parsed meaning of a quoted literal. It's the same situation as with isolated expressions, markup and patterns.
You removed unquoted literals from being amenable to bidi isolation, but they should still be isolatable, no?
They are, covered by the change to name
:
unquoted = name / number-literal
number-literal
doesn't need isolation, because we've limited its valid values, so isolating name
is enough.
The problem with allowing isolates into name
is that it makes name comparison harder. Shouldn't the following two names be equal?
\u2066name\u2069
name
number-literal doesn't need isolation, because we've limited its valid values, so isolating name is enough.
Actually, numbers are complicated in bidi because digits are weakly directional. The minus sign can swing around onto the "wrong" side visually.
The other reason I had unquoted and quoted together is that it simplifies what tools have to do. A tool can blindly isolate any literal separate from the decision to quote it and can blindly remove isolates from literals without looking at the contents.
The problem with allowing isolates into
name
is that it makes name comparison harder. Shouldn't the following two names be equal?\u2066name\u2069 name
As proposed, both of those strings would match the name
rule, but as \u2066
and \u2069
are not valid name-char
characters, they would be parsed according to the open-isolate
and close-isolate
rules, with name-body
matching the four-character "name" string in both cases.
So the parsed value of the name would be "name" for both of the above, and they would be considered equal.
number-literal doesn't need isolation, because we've limited its valid values, so isolating name is enough.
Actually, numbers are complicated in bidi because digits are weakly directional. The minus sign can swing around onto the "wrong" side visually.
But number-literal
only shows up in "code", which is always LTR, yes?
The other reason I had unquoted and quoted together is that it simplifies what tools have to do. A tool can blindly isolate any literal separate from the decision to quote it and can blindly remove isolates from literals without looking at the contents.
The proposed change doesn't change the number of constructs for which this can be done; it replaces "unquoted literals" with "names". Doing so lets us remove needing to separately and additionally pick out the LRM/RLM/ALM from the productions that include name
.
As requested, refactored as an alternative to the proposed solution. Also addressed the concerns identified in #787 and #788, and added an example showing how name
isolation avoids a spillover the current proposal cannot.
I have also validated this solution by implementing it in my parser.
Drop the
bidi
rule, and allowname
to be LR/RL/FS -isolated.Allow an LRI immediately after a non-content newline.
Relax expression & markup isolation to not require pairing on a syntactic level, as the LRI can also be terminated by a newline.