tlaplus / rfcs

RFCs for changes to the TLA+ specification language
MIT License
13 stars 0 forks source link

Support `|S|` as alternative to `Cardinality(S)` #9

Open lemmy opened 2 months ago

lemmy commented 2 months ago

I and others occasionally wish to write the more concise |S| instead of Cardinality(S). Perhaps, this is a feature that can be piggybacked on the Unicode-related changes, although |S| will be a breaking change for all existing parsers.

ahelwer commented 1 month ago

It's an interesting problem but might actually be tractable and I do like how it looks. The main worry is that it would be a horrifying parse ambiguity disaster with the | infix operator. I can whip up a prototype in the tree-sitter grammar which will statically identify any such ambiguities since it's a LR(1) generator.

ahelwer commented 1 month ago

Tree-sitter confirms there is a LR(1) parse ambiguity given the token sequence | (. This means more than one token of lookahead is required in order to tell what parse rule to apply. Possible interpretations of | ( are the | "divides" infix op in nonfix form, like |(2,4), and set cardinality of some expression in parentheses, like |(S)|.

This is tractable, but unfortunately for the tree-sitter grammar declaring a simple precedence conflict to trigger GLR parsing in this case bloats the generated src/parser.c file substantially, from an already-hefty 36 MB to 47 MB. I'll see whether there's a way to avoid this.

ahelwer commented 1 month ago

Would be the same issue because || is also an infix operator.

jonesmartins commented 1 month ago

Sorry, I deleted the comment you just referenced right when I found || was an infix operator, indeed.

ahelwer commented 1 month ago

Actually the more I look at this the more nightmarish it seems. Consider the token sequence | S |. Is this a complete set cardinality expression? Or is this the beginning of something like | S | P |, set cardinality around the infix op expression S | P? This is the real reason the parser size blew up so much I think.

This whole exercise is really showing the value of the tree-sitter grammar in exploring language design, though. It finds all these weird cases statically.

ahelwer commented 1 month ago

One possible solution to all of this is a bit clunky, but the triple-vert ||| S ||| operator could work. Kinda ugly.

lemmy commented 1 month ago

Should we add FiniteSetsExt!card(S) as a synonym for FiniteSets!Cardinality(S) and call it a day? My main motivation for introducing |A| was to have a compact notation for use in papers, where Cardinality(S) tends to be too verbose (hello overfull box). I believe card(S) is widely accepted.

ahelwer commented 1 month ago

If the concern is for papers, it would certainly be possible to alter tla2tex to output |expr| given Cardinality(expr). Of course humans have the same issues with parse ambiguity as parsers themselves, but perhaps paper authors should simply beware of using the | infix operator inside of cardinality expressions.

lemmy commented 1 month ago

If we have syntax in papers that doesn't exist in TLA+, we might as well use pseudo code in the first place.

ahelwer commented 1 month ago

Perhaps true, although it is certainly possible to derive a bijective mapping between TLA+ syntax and generated LaTeX in this case; according to this stackoverflow answer \lvert and \rvert are vertical bars that function as delimiters for things like cardinality, while \vert/|/\divides are used as an operator. Given how nice |S| looks compared to C a r d i n a l i t y (S) it might be worth it to make an exception here. If we really wanted to make it official we could add \lvert S \rvert to the language although it looks terrible. Or \| S \| even though in LaTeX \| is just ||.

lemmy commented 1 month ago

No objections if people want to do that in their papers.