Open lemmy opened 2 months ago
It's an interesting problem but might actually be tractable and I do like how it looks. The main worry is that it would be a horrifying parse ambiguity disaster with the |
infix operator. I can whip up a prototype in the tree-sitter grammar which will statically identify any such ambiguities since it's a LR(1) generator.
Tree-sitter confirms there is a LR(1) parse ambiguity given the token sequence | (
. This means more than one token of lookahead is required in order to tell what parse rule to apply. Possible interpretations of | (
are the |
"divides" infix op in nonfix form, like |(2,4)
, and set cardinality of some expression in parentheses, like |(S)|
.
This is tractable, but unfortunately for the tree-sitter grammar declaring a simple precedence conflict to trigger GLR parsing in this case bloats the generated src/parser.c
file substantially, from an already-hefty 36 MB to 47 MB. I'll see whether there's a way to avoid this.
Would be the same issue because ||
is also an infix operator.
Sorry, I deleted the comment you just referenced right when I found ||
was an infix operator, indeed.
Actually the more I look at this the more nightmarish it seems. Consider the token sequence | S |
. Is this a complete set cardinality expression? Or is this the beginning of something like | S | P |
, set cardinality around the infix op expression S | P
? This is the real reason the parser size blew up so much I think.
This whole exercise is really showing the value of the tree-sitter grammar in exploring language design, though. It finds all these weird cases statically.
One possible solution to all of this is a bit clunky, but the triple-vert ||| S |||
operator could work. Kinda ugly.
Should we add FiniteSetsExt!card(S)
as a synonym for FiniteSets!Cardinality(S)
and call it a day? My main motivation for introducing |A|
was to have a compact notation for use in papers, where Cardinality(S)
tends to be too verbose (hello overfull box). I believe card(S)
is widely accepted.
If the concern is for papers, it would certainly be possible to alter tla2tex to output |expr|
given Cardinality(expr)
. Of course humans have the same issues with parse ambiguity as parsers themselves, but perhaps paper authors should simply beware of using the |
infix operator inside of cardinality expressions.
If we have syntax in papers that doesn't exist in TLA+, we might as well use pseudo code in the first place.
Perhaps true, although it is certainly possible to derive a bijective mapping between TLA+ syntax and generated LaTeX in this case; according to this stackoverflow answer \lvert
and \rvert
are vertical bars that function as delimiters for things like cardinality, while \vert
/|
/\divides
are used as an operator. Given how nice |S|
looks compared to C a r d i n a l i t y (S) it might be worth it to make an exception here. If we really wanted to make it official we could add \lvert S \rvert
to the language although it looks terrible. Or \| S \|
even though in LaTeX \|
is just ||
.
No objections if people want to do that in their papers.
I and others occasionally wish to write the more concise
|S|
instead ofCardinality(S)
. Perhaps, this is a feature that can be piggybacked on the Unicode-related changes, although|S|
will be a breaking change for all existing parsers.