Proposal: combine set filter and set map language constructs

ahelwer commented 1 month ago

I often find myself wanting to both filter and map a set. The way to do this in TLA+ is currently:

op == { x \in { f(x) : x \in S} : p(x) }

or

op == { f(x) : x \in { x \in S : p(x) } }

I think it would be nicer to make this a single operation:

{ f(x) : x \in S : p(x) }

One possible semantic issue is that set mapping supports multiple quantifier bounds:

op == { f(x, y) : x \in S, y \in P }

while set filtering only supports a single quantifier:

op == { x \in S : p(x) }

because after all, what would { x \in S, y \in P : p(x, y) } even mean?

Fortunately having a map operation ensures that these bounds will coalesce into a single stream of elements. However, it does make things more difficult when trying to define the semantics of this combined map/filter operation, since you can't easily decompose it into a set map then a set filter. What does this mean, for example?

{ f(x, y) : x \in S, y \in P : p(x, y) }

It cannot be easily written in terms of the existing map and filter constructs. I believe the translation would have to be something like this:

op == {
  f(x, y) : <<x, y>> \in {
    <<x, y>> \in
      {<<x, y>> : x \in S, y \in P}
    : p(x, y)
  }
}

So it would be a map that wraps the multiple quantifier bounds in a tuple, nested inside the filter that recovers their names with a tuple destructuring, nested inside the map that recovers their names using tuple destructuring.

muenchnerkindl commented 1 month ago

I agree that the combined syntax would be convenient, and it has been used since Dijkstra.

For constructs involving multiple bound variables, an arguably more readable translation (even after your edit) is

LET Filter == { <<x, y>> \in S \X P : p(x, y) }
IN   { f(x, y) : <<x, y>> \in Filter }

Calvin-L commented 1 month ago

I don't love the second colon in { f(x) : x \in S : p(x) }.

The /\ operator makes more sense to me, since x \in S and p(x) are both tests that constrain what elements appear in the set:

{ f(x) : x \in S /\ p(x) }

Thoughts?

lemmy commented 1 month ago

We've received numerous complaints from our target audience about the difficulty and confusion surrounding TLA+ syntax. I suggest we explore ways to assess whether these language extensions are beneficial or detrimental to addressing this issue. To that end, extensions could be required to come with a desugaring tool.

ahelwer commented 1 month ago

@lemmy to my mind this is a simplification of syntax. Users are well-acquainted with map/filter from many other languages, like Python or SQL. The current way of filtering then mapping is very verbose, confusing to read, and confusing to write. I would go as far as to say map and filter should be treated as specialized forms of this operation instead of this operation desugaring to map and filter.

@Calvin-L don't really like the use of /\ to be honest, at least in my mind I read : as "such that" so the dual : makes sense. Using /\ also greatly complicates parsing; S can be an arbitrarily-long TLA+ expression which could contain /\ within it. Trying to use existing infix operator tokens as keywords is fraught with difficulty, as seen in #9. Ideally I would have liked { f(x) : x \in S | p(x) } which fits with mathematical notation I've seen in the real world, but | is also an existing infix operator so would run into the same issue.

lemmy commented 1 month ago

@lemmy to my mind this is a simplification of syntax. Users are well-acquainted with map/filter from many other languages, like Python or SQL. The current way of filtering then mapping is very verbose, confusing to read, and confusing to write.

Might very well be, but let's be data-driven: What's the background of the average target user? What do existing and potential users find confusing? How important is writability compared to readability? There is no need to rush here, and these RFCs should seek input from a broad group of users, not just four or five experts. Related: https://github.com/tlaplus/rfcs/issues/1

Personally, I'm not against this proposal.

ahelwer commented 1 month ago

@hwayne provides the interesting suggestion that as part of these changes, set filter with multiple quantifier bounds could also be added to the language; { x \in S, y \in P : p(x, y) } would be defined to mean { <<x, y>> \in S \X P : p(x, y) }. There is precedent for this in the function definition syntax f[x \in S, y \in P] == ... meaning f[<<x, y>> \in S \X P] == ... as accessed by DOMAIN f. This would simplify the translation of the combined map/filter.

Calvin-L commented 1 month ago

don't really like the use of /\

What about comma? That would align with current TLA+ syntax, where bounds are comma-separated.

Users are well-acquainted with map/filter from many other languages, like Python or SQL.

That's a good point. Should we be surveying how other languages handle set comprehensions? For instance, Haskell allows a comma-separated list of clauses where you can mix binders and filters:

let catsInGoodHomes = [c | h <- homes, isGood h, c <- cats h]

You can even put let-bindings in the clauses; let x = y is the same as the clause x <- [y].

Python has similar syntax:

catsInGoodHomes = [c for h in homes if isGood(h) for c in cats(h)]

These are some examples of how that might be written in TLA+:

catsInGoodHomes == {c : h \in homes, c \in cats(h) : isGood(h)} (original proposal---but see note below)
catsInGoodHomes == {c : h \in homes : isGood(h) : c \in cats(h)} (can we reorder clauses?)
catsInGoodHomes == {c : h \in homes /\ isGood(h) /\ c \in cats(h)} (/\ proposal)
catsInGoodHomes == {c : h \in homes, isGood(h), c \in cats(h)} (, proposal)

On a related note, even though TLA+ does allow multiple bounds in set-filter expressions, it does not allow the bound a variable to depend on another. The TLA+ version of my example is not legal, even without a filter clause:

catsInGoodHomes == {c : h \in homes, c \in cats(h)}
\*                                         ^^^^^^^

I often want that feature.

ahelwer commented 1 month ago

The issue with commas is then it becomes difficult to tell whether you are entering another bound quantifier or entering the predicate. Given that { x \in S : p(x) } is already the syntax when filtering, don't you think it makes sense to keep it?

hwayne commented 1 month ago

The problem I always have with teaching TLA+ is that students can never remember which is map and which is filter. I worry that will be even harder if mapfilter used two colons. Off the wall idea, but what about {f(x): x \in Set WHERE p(x)}?

Also more generally, how much space is there for modifying tla syntax? Is this the only change being considered or are we thinking of adding more syntactic sugar in general?

On Fri, May 17, 2024, 6:39 PM Andrew Helwer @.***> wrote:

The issue with commas is then it becomes difficult to tell whether you are entering another bound quantifier or entering the predicate. Given that { x \in S : p(x) } is already the syntax when filtering, don't you think it makes sense to keep it?

— Reply to this email directly, view it on GitHub https://github.com/tlaplus/rfcs/issues/10#issuecomment-2118494926, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUJO5HVSSC4IGC4NSCOCN3ZC2IL5AVCNFSM6AAAAABHN24X6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJYGQ4TIOJSGY . You are receiving this because you were mentioned.Message ID: @.***>

ahelwer commented 3 weeks ago

I like Hillel's idea. It would be interesting to use optional keywords instead of :, like {f(x) FOR x \in S WHERE p(x)}. This could also be added as alternatives in the existing set map and set filter constructs.

tlaplus / rfcs

Proposal: combine set filter and set map language constructs #10