wren-lang / wren

The Wren Programming Language. Wren is a small, fast, class-based concurrent scripting language.
http://wren.io
MIT License
6.86k stars 550 forks source link

[RFC] Add smartmatch (flexible "satisfies"/"belongs-to") operator #989

Open cxw42 opened 3 years ago

cxw42 commented 3 years ago

(Background: I have been thinking about smartmatch a lot recently. e.g., https://github.com/wren-lang/wren/issues/956#issuecomment-817233866 and https://github.com/wren-lang/wren/issues/968#issuecomment-819148504 . I realized I should actually propose it for independent discussion! This builds on my https://github.com/wren-lang/wren/issues/968#issuecomment-819954076 in the discussion of x in y as an operator. Thanks to everyone participating in #956 and #968 for thought-provoking discussion! Thanks also to all the folks who have worked on smartmatch in Raku over the years.)

Many programs at some point ask "does X have property Y?" or "is X part of collection Y?". For example:

I propose taking a page from the Raku programming language. Raku unifies these tests under a binary operator called "smartmatch".

Overview

Advantages

Advantages when used with a switch statement

Smartmatch provides a very clean way to express switch cases (#956). Each case can be the right-hand side of a smartmatch. That way you can have any case expression you want without having to special-case syntax to support complex conditionals. For example, in switch(val):

Switch+smartmatch can support arbitrarily complex conditions using only Wren code. Programmers can define classes that implement ~~(_) and encapsulate the conditions into those classes.

Smartmatch is a great complement to switch statements. I think it would be useful even if switch were not added to Wren. However, if you disagree, I certainly understand.

Suggested implementations for ~~

A starting point for discussion.

Edit per discussion below, adding !~ which is just like ~~, but with the opposite result. I recommend that Fn.!~(_) throw, since I don't know right now what that would mean.

Implementation in the VM

I would add a CALL_SWAPPED opcode per https://github.com/wren-lang/wren/issues/968#issuecomment-820196805 . The same as regular CALL, but it takes the arguments in the opposite order. That would permit ~~ to be implemented without having to juggle the stack. However, that is only one of many possible options.

Thank you for reading all the way to the bottom :D .

PureFox48 commented 3 years ago

After some reflection on the matter, I've come to the conclusion that a smart-match operator would be a powerful idea and, although you'd need to remember how the built-in classes would behave, the behavior is intuitive anyway and shouldn't be difficult to grok.

I also agree that this would be very useful from a pattern matching perspective if switch is introduced.

The only thing I'm not fond of is the ~~ operator itself which looks odd to me.

I wonder if we could get away with just using a single ~ as the existing use of the tilde is as a unary rather than a binary operator and we have the precedent of doing the same for the - operator (and if #986 is accepted the + operator) without apparently anyone being too confused.

An alternative would be to use some other symbol such as @ or $ which are unused at present though we might want to keep these in reserve for possible future uses.

mhermier commented 3 years ago

There is some precedence somewhere also with the operator ~= at least on lua and probably other. But considering the wanted usage, it looks odd...

I still have some reservations, I need to see it in action and its implementation.

PureFox48 commented 3 years ago

~= in Lua appears to be the equivalent of != in Wren. See here.

That would fit in with your own proposal #985 to allow ~ as an alternative to ! for Bool operations.

mhermier commented 3 years ago

~= does not really make any sense as an assignment operator, because ~ as no meaning as a binary operator (as for !), and because of the nature of it I don't think it it a good idea at all to allow it...

PureFox48 commented 3 years ago

Well, if ~ were allowed as an alternative to !, then it would make sense to allow ~= as an alternatve to !=.

But you're right that this has nothing to do with compound assignment operators so I've edited my previous post accordingly.

mhermier commented 3 years ago

Both ~~ and ~= are fine for me, I only prefer the second one because of the symmetry with the other equality operators.

The biggest reservations I have is about how you declare such method, because of the inversion, I don't find a practical way to express them properly inside the class.

PureFox48 commented 3 years ago

Well, I think if we used ~= as the smart-match operator (and I'd be happy with that) , then it would be better to forget using ~ as an alternative to ! and restrict #985 to just implementing &, | and ^ on the Bool class.

mhermier commented 3 years ago

Hmmm don't know what to think. a ~= b would only have some meaning as ~(a == b) as per symmetry with != which should make it strictly equivalent to a != b. So the trivial implementation does not really have a real meaning/benefit.

I'm not very comfortable with the definition of the rules in general and the Object one in particular. It has too much potential meanings, which only depends on the right hand side of the operator contrary to in, and can be a source of error/confusion.

PureFox48 commented 3 years ago

Well, if we do introduce compound assignment operators, then ~= is not going to be one of them because the bit-wise complement operator ~ is unary.

So, I think it would be reasonable to use ~= for smart-matching which you said you preferred to ~~ yourself.

However, to avoid overloading ~ too much, I'd drop the idea of using it as a Bool operator as we don't need it for that purpose anyway.

cxw42 commented 3 years ago

Binary ~, ~= are fine with me, or =~ for another option.

mhermier commented 3 years ago

I thought about =~ but I discarded it because foo=~bar is ambiguous. It can be:

cxw42 commented 3 years ago

@mhermier good point about the possible parse ambiguity.

Re. ~= vs ~(==) https://github.com/wren-lang/wren/issues/989#issuecomment-826318241 --- it's a fair point.

Tilde is nice because it connotes "like". However, if it's too problematic, my next choice would be @, @=, or ::.

PureFox48 commented 3 years ago

There might also be a case for using $, also currently unused, which is like an S with a vertical bar through it.

The S is suggestive of 'smart' and | is used in some languages as a delimiter in match statements.

TBH, I don't know which I like best.

@cxw42 As it's your idea, I think you should choose :)

mhermier commented 3 years ago

To me, because of Smalltalk, @ is the coordinate operator: when put between 2 numbers it produce a Point. :: is problematic because of C++ which makes it more like variable lookup...

Side note: this is the reason I use logical unary left . to access top level scope on my personal branch ^^

cxw42 commented 3 years ago

@PureFox48 thanks :) . I did some typing tests to check ergonomics, and I thought of one other option: ~: (the "parrot" operator? :D ). That has the advantage over ~= that (on my keyboard) I don't have to lift the Shift key in the middle of the operator.

My preference would be ~~ first, then ~=, ~:, ::, @=, @, $. I have strong personal associations between $ and variables (e.g., shell vars), which is the only reason I would prefer it least.

PureFox48 commented 3 years ago

@cxw42 Well, as @mhermier doesn't like @ or :: and ~~ is your first preference, let's go with that.

It doesn't really have any technical problems, it will be familiar to those who know Raku and there are plenty of precedents for using a doubled symbol as an operator.

Although it looked a bit odd to me at first, I think I'm beginning to warm to it :)

mhermier commented 3 years ago

It is not that I don't like it, it is just that there are strong connotations, that would make a hard learning curve.

PureFox48 commented 3 years ago

As I don't know Smalltalk, the only meaning @ has for me is 'at'.

I agree though that :: wouldn't be a good idea as it will have strong connotations as a scope resolution operator to many people.

I take it you're on board with using ~~, as originally proposed ?

PureFox48 commented 3 years ago

A further thought.

Would it make sense to have a second operator !~ to mean not a match?

cxw42 commented 3 years ago

I support that, and I doubt much existing code logically negates the result of a bitwise complement :D

PureFox48 commented 3 years ago

I hadn't even realized that something like !~42 was legal before but apparently it is (it returns false) because the Num class is inheriting the ! operator from Object.

I don't think this means that !~ (and for that matter ~~) wouldn't be viable as we'd be using it as a binary operator rather than two successive unary operators.

PureFox48 commented 3 years ago

Incidentally, having a negative match operator would further enhance the attraction which the smart-match operator has compared to in for expressing containment.

Instead of: !(x in [1, 2, 3]) we could simply write x !~ [1, 2, 3].

Also being able to write something like x !~ Num when checking x's type would compensate for not having a negative is operator.

mhermier commented 3 years ago

!~~ or !~= not the best elegance but can do if needed.

Off topic: I suspect that this is a sign that the real equality operator is = and not == as per != shows, following that logic... Even more proofs with >=, <=... I understand the motivation of C for requiring a short assignment operator, but you rediscover the inconsistencies by trying to follow the same logic and it fails...

PureFox48 commented 3 years ago

The way I'm seeing this right now is that ~~ and !~ would be analogous to == and !=.

So the = symbol would be replaced by ~ to reflect the fact that the operator is smart-matching (which may test for containment etc) rather than always testing for equality.

I've gone off using ~= altogether. Even though it can't be, it still looks like it's a compound assignment operator. Also a negative version would need to be something like !~= which is very ugly.

cxw42 commented 3 years ago

I have a strawman implementation at https://github.com/cxw42/wren/tree/smartmatch if anyone wants to try it! I implemented it using a new SWAP opcode for simplicity.

Example:

class Test {
  construct new() {}
  ~~(needle) { 42 }   // Note: `needle ~~ haystack` calls haystack.~~(needle)
}

var test = Test.new()
System.print(1 ~~ test) //> 42

I have not yet added any default implementations but will be working on those.

cxw42 commented 3 years ago

I proposed at the top for String that x ~~ str be str.contains(x) (substring test). I just realized that won't work well with a switch statement: switch("a") { case "bar"... } shouldn't match just because there is an a in bar. I looked back at the Raku docs, and Raku's string smartmatch is equality rather than substring. For those two reasons, I have modified my https://github.com/wren-lang/wren/issues/989#issue-866834010 to suggest string equality.

mhermier commented 3 years ago

While it makes a good start to toy with, but I still don't like the syntax of the declaration in the class. The only writing I see for now, would be something like:

(needle)~~(this) {...}

But that would require to change all unary operators...

I suspect this is because you want to test more to equality than substring (and this should be the same for every container/collection).

ChayimFriedman2 commented 3 years ago

I think the String problem is just a symptom that this operator is problematic: is serves too many purposes. If it's the "contained in" operator, then it should refer String.contains(), and not have any implementation for Num, for example. If it's the switch operator, it should perform an equality comparison for strings and be implemented for almost all primitives. The fact that the symbol ~~ has no meaning in math (nor in mainstream languages), also indicates that this is an overly-used operator, so you can't give it a proper name.

Instead, I think we should think about splitting the roles. We can have an in operator, and a case or whatever-called switch match operator. They're similar in the fact that they're both inverted (relative to the other operators), and thus need a CODE_SWAP, but different in purpose.

PureFox48 commented 3 years ago

I find https://github.com/wren-lang/wren/issues/989#issuecomment-830723064 particularly disappointing as I felt that sub-string matching was an important part of this proposal.

I don't think it's necessarily fatal to the original proposal as switch("bar") { case "bar"... } would still have matched even then.

However, @ChayimFriedman2 may be right that it's best to split the roles though, if we split off containment, I'm not sure that this leaves much of a role for smart-matching as we can already do type-checking with is and equality with ==.

As far as containment is concerned, although it was my idea to reuse in and despite objections I still think it's a plausible proposal, I wonder whether it would be better to come up with a new operator instead? I suggest the at symbol, @, might be the best choice of those still available. An advantage of using a symbol rather than a word is that we could then use !@ to mean not contained. Some examples to see how this would look:

var a = 2
var b = a @ [1, 2, 3]   // true
var c = a !@ [4, 5, 6]  // true 

var d = 3
var e = d @ 4..8        // false
var f = d !@ 0..2       // true

var g = "a"
var h = g @ "bar"       //  true
var i = g !@ "baz"      //  false

I'm not sure whether I like this or not but I think it's worth considering.

ChayimFriedman2 commented 3 years ago

Python uses in, and probably other languages too.

Do you have an example of languages that uses an operator (preferably mainstream)? If not, the cognitive overhead will be probably too much.

PureFox48 commented 3 years ago

Can't think of a mainstream language which uses a symbol operator for containment, TBH.

Apart from Python, Kotlin uses in and !in for containment though it's been a while since I used that language and I can't remember now whether those operators can be used with strings or not.

EDIT: They can in fact be used with strings. See here.

PureFox48 commented 3 years ago

Although previously I'd shied away from suggesting a mixed symbol/word operator, !in doesn't look too bad in actual use and, if we had that, we could also introduce an analogous !is.

ChayimFriedman2 commented 3 years ago

It's somewhat hard to lex. However, if in becomes a keyword, we can do it. I prefer is not and not in, however.

PureFox48 commented 3 years ago

Although I like is not and not in myself, they would require a new keyword which might rule them out.

ChayimFriedman2 commented 3 years ago

Are you going to name your variable not? I hope not (pun intended).

PureFox48 commented 3 years ago

Might have used it as a method name but probably won't have been used much in the past.

ChayimFriedman2 commented 3 years ago

I can't think of any meaning to this as a method name that is not covered by overloading !.

PureFox48 commented 3 years ago

I was thinking of a static method rather than an instance method though, having said that, I recall someone pointing out (possibly yourself?) that you can have static operators as well.

ChayimFriedman2 commented 3 years ago

I did: #797.

PureFox48 commented 3 years ago

Yeah, that's it. Strange but true :)

joshgoebel commented 3 years ago

Might have used it as a method name but probably won't have been used much in the past.

I just added not to my vendored Assert the other day:

Assert.not(b.isEmpty)

I think that is more readable than using an operator... I don't think this alone is a good reason not to add it as a keyword though if that would have real overall benefits.

ChayimFriedman2 commented 3 years ago

Good one 😃 Though I would use Assert.assert(!b.isEmpty). In BDD, though, this is useful: x.is.not.something.

cxw42 commented 3 years ago

We certainly could split containment and switching, and there may certainly be a more Wren-esque way to do something like smartmatch. Some questions:

The fact that the symbol ~~ has no meaning in math (nor in mainstream languages)

It is true that not many languages have something like smartmatch yet. And mathematics has no need for a "do what I mean" operator :) . I personally think smartmatch is a very efficient way (only one new operator) to support a wide variety of use cases, and to give users more flexibility. It does take some getting used to, but once you do, it saves you (as a person writing in Wren) from having to remember when to use .contains(), when to use in, when to use .match(), ... .

joshgoebel commented 3 years ago

Though I would use Assert.assert(!b.isEmpty).

I come from a Ruby background where we have not built into the core language, so it's stuck in my head. :-) I don't find ! difficult to read (as long as there isn't also double negation (!unlocked) or "flipped concepts" ie - !open? vs closed?), but I definitely prefer not. Ruby also has unless so we can often avoid the need for negation at the operator level entirely.

So adding not as a keyword would get no objection from me. Then I assume we could write something like:

Assert[not b.isEmpty]
Assert.assert(not b.isEmpty)

If we do add switch, do we want user-defined classes to be usable as cases?

I feel switching only on built-in Core classes seems useful (clean input processing, etc), but ultimately quite limited.

PureFox48 commented 3 years ago

I may be wrong but when @ChayimFriedman2 suggested adding a not keyword, I don't think he had in mind using it as a general replacement for ! but just to negate is and (if we add it) in, given that they're words rather than symbols.

joshgoebel commented 3 years ago

Well we certainly don't have to make it more generic... to me being able to express a not in b but not allowed to express not file.closed() feels a little inconsistent... Why not:

a ! in b

ie, ! means not period.

I guess I'm personally unpersuaded by "they're words [already] rather than symbols" line of thinking, unless you're saying "words are better" or "words belong with words"... but then I'd point out file and closed and words also. :)

...but I'm not making a strong argument here, just providing my thoughts.


Although not file.closed isn't a great example because I'd write file.open instead of that, but I already lost that discussion elsewhere. :)

PureFox48 commented 3 years ago

Well, for better or worse, Wren follows C in preferring to use symbols rather than words for operators. The only exception to this is is and, if we re-designate in as an operator, that would make two.

I actually agree with you that it would be inconsistent to introduce not just to negate these operators and another aspect I don't like is that not would follow is but precede in.

For these reasons I personally would prefer to use !is and !in as the negations of these operators even if they're a mixture of a symbol and a word. As you say yourself, we often use ! with an identifier so we're used to this sort of thing anyway.

cxw42 commented 3 years ago

@PureFox48 re. https://github.com/wren-lang/wren/issues/989#issuecomment-830781833 --- the plot thickens! I didn't realize that String IS-A Sequence. When I implemented ~~ for Sequence, suddenly String went back to substring matching :D .

Another way to handle the exact-match case would be with a temporary list: "a" ~~ "abc" (substring), but "a" !~ ["abc"] (exact match because it's list containment. That seems a bit too subtle to me, but it is an option.

cxw42 commented 3 years ago

I opened a draft PR with the full proposal from the top post, as edited, in case you'd be willing to give it a try and see how it works in practice! Str.~~(_) does implement equality, not substring, in the current version of the PR.

PureFox48 commented 3 years ago

I didn't realize that String IS-A Sequence. When I implemented ~~ for Sequence, suddenly String went back to substring matching :D .

Although I knew String inherited from Sequence, I also knew that (unlike List and Range) it has its own override of the contains method which uses sub-string matching rather than testing that the string contains a single character.

I'd therefore assumed that this is what smart-matching would do as far as strings were concerned and that would include the possibility of an exact match.

However, what hadn't dawned on me is that we need to distinguish between sub-string and exact matching and that the Raku folks had concluded (as you just have) that the latter must win!

This inevitably means that this proposal has no easy way to do sub-string matching because, if you convert the string to a list, then it would only be able to match a single character. That would leave us with having to use a predicate function to provide this functionality.

So, whilst I think this proposal can still work (thanks for the draft PR), it's lost some of the attraction it had to me :(