racket / rhombus

Rhombus programming language
Other
351 stars 62 forks source link

Make an RFC for pattern-matching #23

Closed AlexKnauth closed 3 months ago

AlexKnauth commented 5 years ago

The current match form has a some things that some people want to change, but can't because of backwards compatibility, such as:

  1. patterns could be more hygienic, so list, cons, var, ? etc. should be recognized by binding, with literals instead of datum-literals

     (struct cons [first rest])
     (match (cons 1 '()) [(cons x y) x]) ;=> 1 instead of error

    Cross-reference with #30

  2. else could be more like cond's else to mark the last clause, instead of binding a pattern variable

     (match #f 
       [1 "one"]
       [else (cond [#f "absurd"] [else "else"])])
     ;=> "else" instead of #<void>

    else would be recognized by binding, not by datum, in accordance with (1)

  3. identifier-patterns could be allowed if they have a match-transformer binding

     (define-data List Empty (Cons first rest)) ; Empty defined as identifier match-transformer
     (match (Cons 1 Empty) [Empty "empty"] [(Cons x y) x]) ;=> 1 instead of "empty"
  4. patterns could be allowed in more binding positions, such as function parameters, let bindings, and for-iteration bindings

     (let ([(list x y z) (list 1 2 3)]) x) ;=> 1

    Cross-reference with #36

  5. syntax patterns from syntax-parse could be integrated into normal match patterns

     (match #'(m 1 2 3)
       [#'(_ e:expr ...) #'(begin (println e) ...)])
     ;=> #<syntax (begin (println 1) (println 2) (println 3))>

    [This doesn't seem like it has a lot of support]

  6. patterns that use expressions within them such as app, ?, and == could be able to refer to previous pattern variables bound within the same overall pattern, like syntax-parse allows

     (match (list 1 2)
       [(list x (? (>=/c x) y)) y]) ;=> 2
  7. patterns for matching keywords could be forced to use quote as in '#:kw so that #:kw without the quote can be reserved for special pattern syntax without introducing ambiguity

     (match (list '#:kw 3) [(list '#:kw x) x]) ;=> 3

    where the quote in '#:kw is required for both the expression and the pattern

  8. allow for defining new variables or matching new patterns in between the main pattern and a #:when clause, similar in utility to syntax-parse's #:with pat expr, for-loop's [id (in-value expr)], or data/monad do-notation's arbitrary internal-definitions in between bind clauses

     (match (list 1 2)
       [(list x y)
        ; something like `#:with z (+ x y)` or `(define z (+ x y))`
        #:when (integer? z)
        ....])

    suggested by @sorawee as part of (5) combining with syntax-parse, but it could be independent of that

sorawee commented 5 years ago

Re 2: I think someone made an argument that cond shouldn't have else, and should have #:else instead. I'm neutral on this, but if that's the direction for cond, I think the current behavior for match in this aspect is perfect already. Note that we can't have #:else in match because it's for matching the keyword symbol.

AlexKnauth commented 5 years ago

Oh, I forgot that keyword-pattern syntax was a thing. That's a 7th thing that could be changed if other people agree. Unquoted keywords should be reserved for special syntax such as keyword arguments, just like they are in normal expressions. If you want to match a keyword value with a pattern, you should have to quote it, just like you would when creating a keyword value.

sorawee commented 5 years ago

Slightly off-topic: I actually don't like extensibility via keyword (usually employed in syntax/parse) because it's actually not extensible by clients, even if macro providers want to allow it. Binding OTOH seems to be the perfect tool for this, via stuff like define-match-expander (see also https://github.com/tonyg/racket-auxiliary-macro-context)

dedbox commented 5 years ago
1. patterns could be more hygienic, so `list`, `cons`, `var`, `?` etc. should be recognized by binding, with literals instead of datum-literals
   ```
   (struct cons [first rest])
   (match (cons 1 '()) [(cons x y) x]) ;=> 1 instead of error
   ```

If the goal is to make pattern matching more accessible, we should consider adding patterns that look like actual data and dropping patterns that look like expressions that might have produce the data (with the exception of quasiquote, which is a sort of gray area).

Take Algebraic Racket's pair patterns, for example:

2. `else` could be more like `cond`'s else to mark the last clause, instead of binding a pattern variable
   ```
   (match #f 
     [1 "one"]
     [else (cond [#f "absurd"] [else "else"])])
   ;=> "else" instead of #<void>
   ```

Despite the fact that I love wildcard patterns, I could get behind this one if only because I vaguely remember stumbling on it once or twice in the past, and else is culturally accepted as a bad variable name.

3. identifier-patterns could be allowed if they have a match-transformer binding
   ```
   (define-data List Empty (Cons first rest)) ; Empty defined as identifier match-transformer
   (match (Cons 1 Empty) [Empty "empty"] [(Cons x y) x]) ;=> 1 instead of "empty"
   ```

As long as what constitutes a "match-transformer binding" can be defined sufficiently generically (e.g., as a structure type property). I like the convention that an X-transformer instantiates an expansion-time representation of an X that perhaps generates the X at run time.

4. patterns could be allowed in more binding positions, such as function parameters, let bindings, and for-iteration bindings
   ```
   (let ([(list x y z) (list 1 2 3)]) x) ;=> 1
   ```

Algebraic Racket does this for many base Racket forms. It is a powerful feature, but it complicates the syntax of base forms in subtle ways.

Take the hypothetical (λ ([list null]) ...), for example. Is the argument a static pattern that matches the value (list null), or a variable named list with a default value of null?

Another example is case. Given a Racket expression of the form (case [(datum ...) then-body ...+] ...), do we generalize the datums to a list of patterns or a single pattern? For Algebraic Racket, I tried both. Muli-pattern clauses were useful occasionally, but I went with the single-pattern syntax because it always led to more legible case expressions.

At the moment, I'm happier importing pattern-enabled variants with a descriptive prefix as needed.

5. syntax patterns from syntax-parse could be integrated into normal match patterns
   ```
   (match #'(m 1 2 3)
     [#'(_ e:expr ...) #'(begin (println e) ...)])
   ;=> #<syntax (begin (println 1) (println 2) (println 3))>
   ```

The cost might outweight the benefit here. Many syntax-oriented Racket functions refuse to work unless they're invoked during expansion, and re-tooling match to work in this setting would effectively turn it into a superset of syntax-parse. That outcome isn't necessarily bad, but it sounds like a lot of buck for not much bang. I didn't pursue this angle very far before abandoning it, so I could be wrong.

6. patterns that use expressions within them such as `app`, `?`, and `==` could be able to refer to previous pattern variables bound within the same overall pattern, like syntax-parse allows
   ```
   (match (list 1 2)
     [(list x (? (>=/c x) y)) y]) ;=> 2
   ```

Regarding ? and app, I added and dropped them from Algebraic Racket's pattern syntax once or twice. Injecting run-time logic into otherwise static patterns makes them harder to read and reason about. I found moving these concerns into #:if and #:with directives outside the pattern gave simpler match expressions overall---destructure first, then analyze.

AlexKnauth commented 5 years ago

@dedbox

Re: 1

What do you mean by "look like actual data" vs "look like expressions that would produce the data"? I believe that the patterns should look like expressions.

Most programmers' experience with structured data comes from two things: (1) writing it down as expressions that would produce the data, and (2) seeing printouts of data from println or error.

Currently in Racket, (1) and (2) match up, because (println v) very often produces text such that (eval (read)) of that text would produce v again. And I believe this is natural because even beginners who don't understand the relationships of quote and eval, can easily put a printout of a value into a REPL and play with it. Within the acronym of REPL as Read-Eval-Print-Loop, "RE" combined is mostly a left-inverse of "P", and that helps beginners immensely.

I believe it should be the same with patterns. Even a beginner who doesn't understand quote and eval should be able to take either (1) an expression using the fundamental constructors or (2) a println output of a value, and put it as a pattern inside of a match.

Re: 3

Of course.

Re: 4

If we keep the pattern syntax "constructor-style" / "expression-style" as I talk about in (Re: 1), this will be non-ambiguous, as long as the constructors are always identifiers with compile-time information such as match-transformers.

Re: 5

The basic operations for constructing and destructing syntax objects are not restricted in that way, and syntax-parse can be used as a perfectly good pattern-matching library at runtime. As far as I know, there is no danger of syntax-parse producing an operation that isn't valid at runtime.

Re: 6

This point is not about app and ?, it is about any patterns that might contain expressions. So I can re-make my point using ==. The pattern (== expr) evaluates expr to a value and matches only values that are equal to that value.

(match (list 7 8 9)
  [(list x (== (+ x 1)) (== (+ x 2))) x]) ;=> 7

The x pattern variable defined in the first-element pattern would be available in the expressions within the later patterns. And this shouldn't only be for ==. It should be available for all patterns that use expressions within them, to give the programmer more power to express what constraints they might want the values to have. Even if those constraints subtly depend on the values previously seen in the data.

rocketnia commented 5 years ago

Re: 1

@dedbox: * (x . xs) An unquoted pair that matches any pair and binds the car to variable x and the cdr to variable xs.

If you really like for patterns to look like write output (rather than print output), you could always write this one as `(,x . ,xs) like the others, right? :) I know that's a lot of noise, but it seems like just the right amount of noise to properly switch to the read notation for cons cells and back to the expression notation for variables (which don't exist in read notation).

Re: 3

@AlexKnauth: 3. identifier-patterns could be allowed if they have a match-transformer binding

Personally, I think more operations ought to have parens around them, in this case (Empty), similar to (void). That way, in patterns, it's clear Empty is in a usage context and not a binding context.

Lexical scope gets tricky to explain if variable binding occurrences and usages look exactly the same. That is to say, several programming languages' bizarre notions of lexical scope and global scope are pretty simple combinations of first attempting to look up the variable and then attempting something else if that doesn't work. As simple as the rules tend to be once we know them, they're some of the most flabbergasting discrepancies to find when migrating from one language to another, largely because "attempting something else if that doesn't work" is a good way to make errors invisible, and the "something else" varies from one language to another. I think it's good to minimize this kind of branching.

The broader issue is probably a lost cause in Racket 2. Even Racket 1's function call syntax works by first attempting to look process the expression as a macro call and then attempting to compile the expression as a function call if that doesn't work. So I'll settle for just saying it's not too bad that we have to write a few more parens in our patterns.

Edit: Rearranged the two sections, added "Re: 1" and "Re: 3" headers, and added author names to the quotes.

sorawee commented 5 years ago

There's another possible design: tag every id that you want to bind explicitly.

(match ...
  [(list (id x) (id xs) ...) ...]
  [empty ...]
  [(id else) ...]
  [else ...])

I do not necessarily like this. Just mention it for the sake of completeness.

rocketnia commented 5 years ago

Oh yeah! Sometimes that's how I think expressions really oughta look, but it's rather frustrating to type it out that way. XD

michaelballantyne commented 5 years ago

Re 6, this could inhibit some re-ordering optimizations that match attempts (though are currently broken), because it would force left-to-right matching order.

dedbox commented 5 years ago

Re: 1

What do you mean by "look like actual data" vs "look like expressions that would produce the data"? I believe that the patterns should look like expressions.

Judging by the rest of your response, I think you get what I meant.

Most programmers' experience with structured data comes from two things: (1) writing it down as expressions that would produce the data, and (2) seeing printouts of data from println or error.

Why do you believe this? In my (considerable) experience teaching children and adults their first through fifth programming languages, not one has expressed or implied this view. Overwhelmingly, the intuition is that expressions are "commands" that incidentally produce "stuff." In this case, the "stuff" is quoted data. Asking a beginner to ignore unfamiliar punctuation for a while, then leading into explanations as we develop context and their curiousity takes over, is at least an order of magnitude easier than trying to give them a useful awareness of the symmetry between read, eval, and print, even when everyone is sufficiently motivated.

More to the point, I think non-experts have an easier time understanding a small set of concrete things they can play with at the REPL than they do understanding the multitude of expressions that can produce those things, for the same reasons we often like to expand our Racket programs to a small kernel before processing them.

I believe it should be the same with patterns. Even a beginner who doesn't understand quote and eval should be able to take either (1) an expression using the fundamental constructors or (2) a println output of a value, and put it as a pattern inside of a match.

I strongly disagree, and would go as far as to say this sentiment is dangerous. Expression-style patterns are an invitation to confuse a thing with its name, which is a far greater threat to the long-term emotional well being of newcomers than anything else we've discussed so far. And, as I'll allude to below, it makes taking the leap from programmer to meta-programmer—a necessary step on the path to language-oriented programmer—a lot scarier.

Re: 4

If we keep the pattern syntax "constructor-style" / "expression-style" as I talk about in (Re: 1), this will be non-ambiguous, as long as the constructors are always identifiers with compile-time information such as match-transformers.

Wholesale dependence on compile-time information for run-time matching is a slippery slope into madness. Putting aside the absurdity of requiring every literal we want to match against to have a transformer binding, the depths of confusion underlying this sentiment are vast and the consequences arise subtly.

Back when Algebraic Racket was a DSL for initial small-step interpreters, I added lexically scoped identifier patterns. Implementing, say, a simple infix arithemetic calculator DSL with them was quite satisfying:

(define calc
  (function
    [(x * y) (* (calc x) (calc y))]
    [(x + y) (+ (calc x) (calc y))]
    [n #:if (number? n) n]))

In these patterns, * and + are literals, but x, y, and n are variables. The pattern language implementer's job is almost trivial, but now I (the pattern language user) have to keep track of which identifiers are in scope and which aren't in order to make sense of patterns that should intuitively describe the "shape" of "stuff." In other words, the meanings of patterns may change unexpectedly and without obvious reason as I define and require. I don't think users struggling to remember else isn't a valid pattern would be up for that, nor should they be.

The worst part about this mess is that, once you're stuck, the only way forward appears to be to take a premature plunge into the deep end of Racket meta-programming. Grappling with the full weight of free versus bound identifiers, scope sets, the expander (and its quirks), when you're still not sure what the heck a syntax object is? That's an invitation to drown.

My takeaway: "not ambiguous" is a far cry from "no surprises," and the acceptable bar should be much closer to the latter.

Re: 5

The basic operations for constructing and destructing syntax objects are not restricted in that way, and syntax-parse can be used as a perfectly good pattern-matching library at runtime. As far as I know, there is no danger of syntax-parse producing an operation that isn't valid at runtime.

I'm not saying it will lead to invalid programs. I'm saying it complicates the pattern language design and the developer experience substantially for no obvious payout. This "feature" is not hard to implement on top of match. If you do try, I think you'll wind up with a very nice ad-hoc, informally specified, bug-ridden, slow implementation of half of syntax-parse, because that's what happens when you start stitching syntax patterns into value patterns.

We could talk in circles all day around whether or when it's useful to think of code as data. But in practice, the semantic differences between expansion and evaluation give rise to such disparate programming experiences that they really do merit separate tooling. And since we both seem to agree that syntax-parse is at least an acceptable syntax-level analog to match, I don't see what we stand to gain by conflating these experiences.

Re: 6

(match (list 7 8 9)
  [(list x (== (+ x 1)) (== (+ x 2))) x]) ;=> 7

This pattern looks like spaghetti at a glance. I can't tell how many elements the list pattern has unless I slow down and untangle the structure from the logic. Shunting that logic elsewhere is the single most clarifying change we could make for opening pattern matching to a broader and less expert audience.

(match (list 7 8 9)
  [(list x y z)
   #:if (= y (+ x 1))
   #:if (= z (+ x 2))
   x])

To be clear, it also "solves" the issue of pattern variable visibility by obviating the need for embedded expressions. There's always another way and, from what I've seen, the other way usually works out better for everyone.

dedbox commented 5 years ago

Re: 1

@dedbox: * (x . xs) An unquoted pair that matches any pair and binds the car to variable x and the cdr to variable xs.

If you really like for patterns to look like write output (rather than print output), you could always write this one as `(,x . ,xs) like the others, right? :)

Hi Nia!

In this case, write versus print isn't really an issue. In fact, Algebraic Racket already recognizes your quasiquoted example.

Scratch that. I get what you're saying now. It isn't so much that I want my patterns to look like the output of write, but that I'm free to elide the extra punctuation without making the pattern harder to read or introducing subtle bugs.

sorawee commented 5 years ago

I wish match can bind a derived value and use the variable in both when clause and the body. E.g.,

(match '(set x 12)
  [`(set ,v ,e)
    #:when (not (equal? e (expensive-retrieval v)))
    (record-overwritten! v (expensive-retrieval v))
    (record-new-value! v e)]
  [`(set ,v ,e) (void)]
  [`(lam ,fmls ,body) (void)])

Here, I need to call (expensive-retreival v) twice. While match provides => which gives me the ability to skip to the next branch, it seems unnecessarily complicated. It would be nice if we can write:

(match '(set x 12)
  [`(set ,v ,e)
    #:with result (expensive-retrieval v)
    #:when (not (equal? e result))
    (record-overwritten! v result)
    (record-new-value! v e)]
  ...)

And I guess this is one of the thing that I will get for free by unifying match with syntax-parse?

(#:with should work with multiple values as well I guess?)

AlexKnauth commented 5 years ago

Thanks @sorawee! I put that as point (8) though, since it doesn’t have to be connected with the syntax-parse suggestion.

As an aside, it sounds like not everyone wants syntax-parse to be combined with match. However that doesn’t mean we don’t want syntax-parse-inspired features like #:with in match

jackfirth commented 5 years ago

If they were kept separate, would there be value in "match classes" so match patterns can use the same colon notation that syntax patterns do?

> (match (list "hello" "world")
    [(list s:string ...) (map string->symbol s)])
(list 'hello 'world)
AlexKnauth commented 5 years ago

How would/should "attributes" for match classes work? They should be accessible as normal values without a special template or attribute form. Would they need ellipsis depths just like syntax-parse classes? (Edit: after reading https://github.com/racket/racket2-rfcs/issues/23#issuecomment-515700877, yes they would need ellipsis depths to disambiguate certain things)

jeapostrophe commented 5 years ago

I'd like to see match/syntax-parse and quasiquote(template)/syntax come closer in this way. I don't think the pattern-variables-are-not-variables thing is very intuitive and I think it would be nicer if something like (template (+ (f x) ...)) where x is bound to (list 1 2 3) evaluated to (+ (f 1) (f 2) (f 3)). In other words, I don't think match attributes should require a special form and we should try to make syntax attributes work the same way.

tgbugs commented 5 years ago

Related to attributes in a match context, is the fact that dot access to attributes only goes down a single level a technical limitation or a conscious design decision? I have come to appreciate it because it forces each syntax class to handle all attributes explicitly, but there seem to be two potential issues. First, it is confusing for those who are used to dot access that can be invoked in a chain template-variable.attr1.attr2.attr3 etc. Second, it leads to code duplication and naming by convention, for example content.head.head is forced to become content.head-head. This may only be an issue when writing matching for recursive structures/syntax but I wanted to bring it up as a pain point in the existing system.

rmculpepper commented 5 years ago

@tgbugs If you use #:attributes, you can export attributes with dots in their names.

> (define-syntax-class binding (pattern [var:id rhs:expr]))
> (define-syntax-class bindings #:attributes ([b.var 1] [b.rhs 1])
    (pattern (b:binding ...)))
> (syntax-parse #'((x 1) (y 2)) [bs:bindings #'(bs.b.var ...)])
#<syntax (x y)>

Without #:attributes, define-syntax-class must infer the list of attributes, and it must do so without any knowledge of what other syntax classes export (because of forward and recursive references).

rmculpepper commented 5 years ago

@jeapostrophe If x is bound to '(1 2 3) and f is bound to '(lambda (x) x), should (template (+ (f x) ...)) produce (+ (lambda 1) ((x) 2) (x 3))?

AlexKnauth commented 5 years ago

Oh, so that's a situation where ellipsis-depth would be needed to disambiguate it

jeapostrophe commented 5 years ago

@rmculpepper Yes, but if x were '(1 2 3) and f were (lambda (x) x) then you'd get 6. If you wanted (+ (f 1) (f 2) (f 3)) when f were the list, I'd prefer a different template syntax (maybe communicated via syntax-time struct-property that is essentially ellipsis depth) or a special kind of thing other than lists doing the sequencing.

mflatt commented 3 months ago

527