Add switch statement - Githubissues

memreflect commented 2 years ago

While the ~ command in es definitely reduces the need for a switch command, that need is not completely eliminated. There is likely a good reason rc has a switch command despite also having ~, and #31 also indicated it would be a beneficial addition. Taking those facts into consideration, I have implemented it with the same behavior as equivalent uses of if with the ~ command. Additions to the trip.es file attempt to exhaustively ensure this.

I'd also like to note that in contrast to the current switch syntax in Byron Rakitzis's rc, enclosing parentheses are not required for single-word subjects, nor is a redundant set of parentheses required for multi-word subjects:

switch($subject) { case ... } → switch $subject { case ... }
switch((foo bar)) { case ... } → switch (foo bar) { case ... }

The new addition is documented in the manpage, which had some additional cleanup done thanks to the warnings from mandoc -T lint on FreeBSD, and the documentation for the break command has been updated to reflect its lack of effect on a switch.

wryun commented 2 years ago

Neat!

However, I'm not sure about this. It feels like it's not quite in es's minimalist functional spirit. Also, if we were to implement switch, I think I'd prefer use of program fragments (i.e. {} - as in 'if'). I know rc doesn't do this, but rc actually has syntax for while/if rather than treating them as functions.

Maybe there's a sweet spot between this and the function in #31 ? Not that I can think of one off the top of my head.

hyphenrf commented 2 years ago

to me a specialized switch makes sense because it'll be made to handle patterns correctly, in a first-class manner, vs resorting to wrapping patterns in strings then eval-ing them. It'll also have more uniform syntax with the rest of the language (vs relying on a list). And it'll be able to restrict the shape of the code inside its body (versus a regular function where anything (incorrect) goes).

My little function doesn't even handle multiple (or-)patterns in the same case field. it has to eventually be one string and nothing more.

wryun commented 2 years ago

Having thought about this a little more, I think I'd prefer a switch that was special in a different way. e.g.

fn switch input fragment {
  local (fn case args {
    if {~ $input $args} {  # stupid trick here - fragment won't pattern match
      $args($#args)
    }
  }) $fragment
}

switch bar { 
  case foo { 
    echo Got foo! 
  } 
  case bar { 
    echo Got bar! 
  } 
}

This sort of works, but doesn't actually handle patterns properly (and doesn't have the usual abort on first match behaviour). Part of the issue here is that it's hard to parameterise patterns - i.e. something like this fails:

pattern=fo*
echo <={~ foo $pattern}

(and would fail for the same reasons in the above switch)

As far as I know, there's currently no way around using eval as @hyphenrf did in #31 . But maybe there's some neat way we can change ~ so that this works (though I guess you still run into the filename expansion issue). Really just addressing #32 , I guess.

Another thought would be to add a new case pattern matching 'thing' which is basically just the case function above but uses some hard-coded var ($case?) and does the early return and proper pattern behaviour. But I guess that's doing as much or more syntactical violence as this proposal :)

hyphenrf commented 2 years ago

that's an interesting scoping rule with local being allowed to appear in function arg on call site.. is it elaborated somewhere?

mwgamera commented 2 years ago

It's not elaborated, but manual does explicitly say that fn name parameters { commands } is equivalent to fn-name = @ parameters { commands }. If you take that literally, it follows that it should be usable in binders.

hyphenrf commented 2 years ago

it is not the case for let-bindings though. as a minimal example:

fn ex-1 body { local (fn log { echo $body being executed }) $body }
fn ex-2 body { let (fn log {echo $body being executed }) $body }

if we do ex-1 { log ls; ls }, we find that log is in scope, but if we do ex-2 instead, we get an error (as one would expect with scoped bindings).

this behaviour with local is very cool for defining little DSLs like switch. It'd make a lot of sense to highlight it!

hyphenrf commented 2 years ago

btw, @wryun, this is a cool solution you came up with. I took it and added the usual eval & break stuff so that I can use it in my code, and also as a tmp solution until an agreed-upon implementation of switch statement is included in the language.

# usage: switch expr {
#          case lit single-cmd
#          case 'pat0*' { ... }            # <- quotation needed, blocks too
#          case 'pat1*' 'pat[0-9]' { ... } # <- because we have or-patterns
#          default { ... }
#        }
fn switch match cases {
   local (
     fn case    args { eval \~ $match $args && $args($#args) }
     fn break   args { throw break <={~ $#args 0 || result $args} } # optionally alias break to throw break 0
     fn default args { $args }
   )
   catch @{if{ ~ $1 break }{ result $*(2 ...) }{ throw * }}{ # optionally propagate break's payload
     $cases
   }
}

wryun commented 2 years ago

Nice! I briefly tried to mess with $&noreturn but couldn't get it to do what I wanted.

memreflect commented 2 years ago

(I apologize for the amount of information below in advance.)

Having thought about this a little more, I think I'd prefer a switch that was special in a different way. e.g.
...
switch bar { 
  case foo { 
    echo Got foo! 
  } 
  case bar { 
    echo Got bar! 
  } 
}

I agree that syntax is much more in line with the rest of es. I've pushed changes for similar syntax to my switch-statement branch. However, there are some differences you may or may not agree with:

Multiple cases may still be specified in a single line in the familiar cmd1 args; cmd2 args style, but only 1 semicolon ; is allowed after the closing brace } for a case body:
```
switch foo {
    case bar {echo bar}; case foo {echo foo};
    case * {throw error example no matches};
}
```
Only case A B C ... { ... } items are allowed inside a switch, so the full cmd1 args;;; ; ;; ; cmd2 args;;; ;; ; syntax is unnecessary and difficult to read. I expect interactive use of switch commands will be exceedingly rare, but there's no reason to require newlines between cases in an interactive shell, and the semicolon as a command separator makes sense if one considers each case to be similar to a function (but without wildcard expansion).
Cases consisting of a single "word"—including expressions like (1 2 3), $listvar, and `{command args}—remain unchanged, but multiple patterns in a single case must be specified by enclosing them in parentheses. This avoids a problem where the case keyword adjacent to the preceding case block would result in the new case keyword being one of the patterns for the preceding case rather than triggering a syntax error:
```
switch foo {
    # case (foo {echo ok} case *)
    case foo {echo ok}case * {echo fail}
}
```
While this is a bit of an unimprovement, it is consistent with the (a b c) = ... syntax used in variable bindings. After all, each case block is bound to its corresponding list of patterns.
The way I implemented the changes, single-word commands do not require braces, and my latest commit also made case bodies optional since matches implicitly return true/0 anyway:
```
fn is-even n {
    switch $n {
        case ()       {throw error is-even expected one or more values}
        case *[02468] # true
        case *        false
    }
}
```
The behavior can easily be changed so that braces are required, implicitly forbidding an empty case body.

Incidentally, the "case" keyword is now unnecessary, so it could be removed. After a couple of minor adjustments, its removal would mean one less keyword and no need to repeatedly type "case". It also would look remarkably similar to the case X in ... esac syntax used in sh, bash, etc.

# Display file type
switch $filename {
    (*.a *.lib)  {echo library}
    *.c          {echo C source}
    *.h          {echo C/C++ header}
    (*.o *.obj)  {echo object}
    (*.so *.dll) {echo shared/dynamic library}
    (*.C *.cc *.cpp *.cxx *.c++) {
        echo C++ source
    }
    (*.H *.hh *.hpp *.hxx *.h++) {
        echo C++ header
    }
    * {throw error ftype unknown file type}
}

A less powerful but very similar construct in R6RS (Scheme) and Racket is also called "case", so perhaps a name change is worth considering if this syntax is adopted.

mwgamera commented 2 years ago

I want to note that this makes ~ obsolete and ~ subj pat ... should probably be rewritten to something like switch subj (pat ...) true * false (or whatever syntax it ends up having) rather than being left as a separate construction.

hyphenrf commented 2 years ago

I think it's good to have both (and ~~) though. It's certainly lighter to write if {~ a b} c d than to write switch a { case b c; case * d }. The readability benefits only start showing when you're doing multiway logic on a variable. So each has its own use-case I believe.

I wouldn't want to write a quick throwaway expression like this throw break <={~ $#args 0 || result $args} in terms of switch

hyphenrf commented 2 years ago

@memreflect question: with the way you handle multiple patterns are the following #1 and #2 semantically different?

e = (a b c)
switch pat {
  (a b c) d #1
  $e      f #2
}

memreflect commented 2 years ago

I want to note that this makes ~ obsolete and ~ subj pat ... should probably be rewritten to something like switch subj (pat ...) true * false (or whatever syntax it ends up having) rather than being left as a separate construction.

The proposed switch command currently emulates the behavior of

if {~ subj pat ...} {
    cmd
} {~ subj pat2 ...} {
    cmd2
} ...

While you could replace the humble ~ subj pat ... with a switch, it would be the equivalent of rewriting the command in a more explicit form with if:

# ~ subj pat ...
if {~ subj pat ...} true false

For that reason, I feel ~ should be retained even if a switch command is added in some form.

memreflect commented 2 years ago

@memreflect question: with the way you handle multiple patterns are the following #1 and #2 semantically different?
e = (a b c)
switch pat {
  (a b c) d #1
  $e      f #2
}

No, #2 is effectively redundant. Assuming the case keyword is unnecessary, that code would behave the same as

e = (a b c)
if {~ pat (a b c)} {
    #1
    d
} {~ pat $e} {
    #2
    f
}

Since $e and (a b c) are equivalent, #1 would be executed if pat matched, else nothing would happen. #2 will never be executed.

mwgamera commented 2 years ago

For that reason, I feel ~ should be retained even if a switch command is added in some form.

I don't postulate entirely removing ~ from language. No one wants to type lengthy keywords all the time. But just as you wrote, switch is simply an if with ~; it completely covers ~'s functionality. So I would like to see one being considered a syntactic sugar of the other (I don't care which way) and rewritten during parsing the same way most other constructs are. It can't be written directly in es because it needs to treat arguments specially, but intuitively it shouldn't need a separate node kind.

That being said, many things already aren't perfectly orthogonal and it's not a showstopper.

memreflect commented 2 years ago

I don't postulate entirely removing ~ from language. No one wants to type lengthy keywords all the time. But just as you wrote, switch is simply an if with ~; it completely covers ~'s functionality. So I would like to see one being considered a syntactic sugar of the other (I don't care which way) and rewritten during parsing the same way most other constructs are.

Sorry for the misunderstanding. I don't believe rewriting ~ in terms of switch is the best idea. If I understand your suggestion however, converting switch into an if with ~ commands is completely reasonable. I haven't pushed any changes, but does the example below illustrate your intent?

# input
# ↳ output from es -x to illustrate the rewrite
switch () {}
↳ {if}
switch foo {}
↳ {if {~ foo }}
switch foo {case () false}
↳ {if {~ foo } false}
switch foo {case (x y z) false}
↳ {if {~ foo x y z} false}
switch foo {case (x y z) false; case fop {}; case * {result 42}}
↳ {if {~ foo x y z} false {~ foo fop} {} {~ foo *} {result 42}}

mwgamera commented 2 years ago

but does the example below illustrate your intent?

Yes, that's the idea. And you are right that it's better in this direction.

wryun commented 2 years ago

Thanks @memreflect - this looks good to me.

@mwgamera if you have a chance to look at this PR, that'd be great. It looks sane to me, I much prefer the 'rewrite as if' that you suggested, and though my instincts are against adding new things to 'es' at this stage (particularly new syntax) I'm leaning towards merging.

It will break existing scripts that use switch or case, but I'm hoping that's not too common (and how many heavy users of es are there anyway...).

wryun commented 2 years ago

The other thing that might be worth considering here is calling this match instead of switch, to be clear we're following more in the functional tradition instead of the C-style switch/case/break.

And potentially drop case entirely. e.g. https://doc.rust-lang.org/reference/expressions/match-expr.html

So:

res = <={match $var ( 
  foo { 
    result bar 
  } 
  bar { 
    result foo 
  } 
)}

I've also changed the argument to match here to make it a list. This means we're closer to something that can be transformed without additional syntax shenanigans, though currently we still need it because of the special pattern handling.

Thoughts?

hyphenrf commented 2 years ago

It will break existing scripts that use switch or case, but I'm hoping that's not too common (and how many heavy users of es are there anyway...).

A strongly worded changes entry about the breaking introduction of a new kw should cover most people interested in updating :P

I've also changed the argument to match here to make it a list. This means we're closer to something that can be transformed without additional syntax shenanigans, though currently we still need it because of the special pattern handling.

I'd rather it be curly braces to indicate the special-form aspect of the expression, speaking for myself only of course. If it's just list-like but breaks the expectations of what something like (a* bx) means, I think that'll make the language appear less homogeneous.

wryun commented 2 years ago

@hyphenrf - re:

I'd rather it be curly braces to indicate the special-form aspect of the expression, speaking for myself only of course. If it's just list-like but breaks the expectations of what something like (a* bx) means, I think that'll make the language appear less homogeneous.

I think () makes the language more homogeneous than {} in this case, because every other instance of {} acts as a program fragment or similar, and can be read as 'normal' code. Doing match {} means that the contents of what looks like a fragment are now special.

Conversely, it's already valid es (provided a match function is defined) to do:

  match $var (some list stuff)

As you point out, what we're proposing here does change the interpretation of what's in this supposed 'list' in a surprising way, but I think the program fragment one is more surprising. I think the closest parallel is the for statement:

for (var = list) command

Here we use () but interpret it differently due to context. But currently the 'rules' are never broken for {}.

wryun commented 2 years ago

@memreflect by the way, I'm impressed you're making all these changes so quickly, and I'm not meaning to push you around to many different solutions, particularly when I don't feel like we have consensus yet. Don't feel like you have to follow my latest whim, and feel free to disagree!

wryun commented 2 years ago

Ah, @memreflect , I (belatedly) realised that this kind of rewrite evaluates multiple times. e.g.

{match `{echo foo} { bar { echo 1 }; foo { echo 2 } }}

is transformed to:

{if {~ <={%backquote <={%flatten '' $ifs} {echo foo}} bar} {echo 1} {~ <={%backquote <={%flatten '' $ifs} {echo foo}} foo} {echo 2}}

I think we should either flip the rewrite (i.e. rewrite ~ to match, and implement match) OR rewrite match with a special let variable so we only evaluate once (don't know if there's prior art in es for this, though).

i.e. something like:

match `{echo foo} (bar true foo false)

would rewrite to:

let (__es_match=`{echo foo}) {if {~ $__es_match bar} true {~ $__es_match foo} false}

I have a slight preference to not require ;, though I guess it is used in bind statements... (for/local/let). Ok either way

EDIT: actually, now I like the parallelism with the bindings. I'm tempted to go all in and suggest completely copying rust's match by adding a '=>'. Your call, and I will stop commenting now.

memreflect commented 2 years ago

The other thing that might be worth considering here is calling this match instead of switch, to be clear we're following more in the functional tradition instead of the C-style switch/case/break.

Agreed and implemented.

And potentially drop case entirely. e.g. https://doc.rust-lang.org/reference/expressions/match-expr.html

Done.

So:
res = <={match $var ( 
  foo { 
    result bar 
  } 
  bar { 
    result foo 
  } 
)}
I've also changed the argument to match here to make it a list. This means we're closer to something that can be transformed without additional syntax shenanigans, though currently we still need it because of the special pattern handling.

Thoughts?

I'm not too attached to the block syntax. One could argue the block syntax admittedly no longer makes sense without a case keyword because suddenly no commands are executed inside that block; with a case keyword, it at least felt like there was a command, even if case ... was the only command allowed. A change to list syntax would result in a resemblance to existing binding constructs like let, so there is clear precedent in favor of ( ) rather than { }:

match $var (
  (pattern1 pattern2) {cmd}
  pattern3            {another-cmd}
  ...
)
# match $subject (pattern... {block}; pattern... {block}; ...)

I will say the change to parentheses can make matching the empty list appear odd in an inline match command. Then again, I can't remember the last time I typed a case...esac command in sh/bash/ksh directly on the command line either, so it probably won't be seen often, unlike ~:

match $var (() {echo unset}; '' {echo null}; * {echo $var})

Other than that oddity and the fact that it feels like a block is missing at the end, the parentheses are fine. Your rationale for using parentheses instead of braces makes sense. Making braces a special case for a single command would be a problem.

memreflect commented 2 years ago

Ah, @memreflect , I (belatedly) realised that this kind of rewrite evaluates multiple times. e.g.
{match `{echo foo} { bar { echo 1 }; foo { echo 2 } }}
is transformed to:
{if {~ <={%backquote <={%flatten '' $ifs} {echo foo}} bar} {echo 1} {~ <={%backquote <={%flatten '' $ifs} {echo foo}} foo} {echo 2}}
I think we should either flip the rewrite (i.e. rewrite ~ to match, and implement match) OR rewrite match with a special let variable so we only evaluate once (don't know if there's prior art in es for this, though).

i.e. something like:
match `{echo foo} (bar true foo false)
would rewrite to:
let (__es_match=`{echo foo}) {if {~ $__es_match bar} true {~ $__es_match foo} false}

That is a good point. In my enthusiasm, I missed that potential performance issue. Fixed by assigning to temp variable bound with let as suggested.

I have a slight preference to not require ;, though I guess it is used in bind statements... (for/local/let). Ok either way

EDIT: actually, now I like the parallelism with the bindings. I'm tempted to go all in and suggest completely copying rust's match by adding a '=>'. Your call, and I will stop commenting now.

I like the parallelism with bindings as well, especially since the commands are technically bound to the patterns preceding them (sort of like let ({cmds} = patt1 patt2 ...) {...}). Of course, that parallelism suggests there should be a block at the end as I already mentioned, though I can't think of a good reason to add this (what would it do?), not to mention there's the question of whether such a block would affect the result returned by <={match ...}.

I'm not sure adding => would have any benefit other than readability, which can usually be solved with alignment of the various blocks or splitting the block into multiple lines with proper indentation.

Btw, my latest commit allows you to separate the opening parenthesis '(' from the line the match subject is on. Since match is looks so much like a binder now (for/let/local), I figured I may as well go ahead and make it behave more like one.

mwgamera commented 2 years ago

I like match but ~ is also called match in few places so it might be confusing. But the documentation doesn't call it that so hopefully it won't be a problem. Note, though, that it's nothing like the match construct in functional languages or in Rust which allows extracting data from complex structures rather than glob-matching strings. I don't think => would add anything either.

As for the syntax, I don't have a strong opinion on it. I think the most obvious would be to have match target pat1 cmd1 pat2 cmd2 ... without any brackets, but I realize this would be extremely inconvenient to use on multiple lines. XS had a switch function like that, but it was a regular function without pattern matching so it could be bracketed ad lib. All in all, the current version looks okay to me.

However, as long as ; is used, I think it would be more consistent with other constructs to allow multiple words. Currently match foo (foo bar baz; quux plugh) is a syntax error but it could just run bar baz. There is no ambiguity for where patterns end as they already need to be in parentheses when they use multiple words.

Multiple evaluation is probably the biggest problem which I should have anticipated before throwing around the idea of rewriting it. Using named variable is ugly but I don't know what would work better, so let's leave it at this. I think its name needs to be mentioned in documentation.

The parallelism with binder together with the use of a named variable really just suggests solution similar to what wryun wrote at the beginning: case pat cmd could be defined as if {~ $__es_matchtmp pat} {return <=cmd} and switch (subj) block as @ __es_matchtmp block subj. But that would be a bit quirky (kind of like writing Perl) and I think I came to like the current proposal a bit more.

hyphenrf commented 2 years ago

One thing that could be useful is marking the match variable with the "primitive" sigil, like some special variables already are.. thus making it impossible (?) to unintentionally construct at userspace. I can already name variables collect for example without worrying about interference with &collect How would that play along with the language though, I have no idea.

hyphenrf commented 2 years ago

@wryun that is a fair argument you raise on the curly braces. I'm convinced

memreflect commented 2 years ago

One thing that could be useful is marking the match variable with the "primitive" sigil, like some special variables already are.. thus making it impossible (?) to unintentionally construct at userspace. I can already name variables collect for example without worrying about interference with &collect How would that play along with the language though, I have no idea.

There is no precedent for this, so I'm not sure this is a good idea. You can think of primitives as functions with names that cannot be reassigned. Also, plain variables like $apid exist and can be assigned values, so I don't think there's a need to make a special case just for one variable that disappears at the end of the match command.

memreflect commented 2 years ago

I like match but ~ is also called match in few places so it might be confusing. But the documentation doesn't call it that so hopefully it won't be a problem. Note, though, that it's nothing like the match construct in functional languages or in Rust which allows extracting data from complex structures rather than glob-matching strings. I don't think => would add anything either.

~ is not like anything in any other language I've used, functional or otherwise, so match is already different because of that.

As for the syntax, I don't have a strong opinion on it. I think the most obvious would be to have match target pat1 cmd1 pat2 cmd2 ... without any brackets, but I realize this would be extremely inconvenient to use on multiple lines. XS had a switch function like that, but it was a regular function without pattern matching so it could be bracketed ad lib. All in all, the current version looks okay to me.

Well, I somehow managed to get the following syntax working, which might be more desirable than the binding-like syntax, and the ; separators are gone:

match foo (bar baz) {
    echo bar baz
} (quux plugh) {
    echo quux plugh
} f* {
    echo starts with f
} {
    echo no matches
}

Unfortunately, the ( ) enclosing multiple patterns is still a requirement, but I'm experimenting to see if that can be changed without sacrificing the ability to make match foo bar true false equivalent to if {~ $__es_matchtmp bar} {true} {false}. For that reason, this new syntax is not available yet.

Edit I was clearly not thinking. match foo bar true false echo is horribly ambiguous if this were allowed. As a result, (bar true) or (bar true false) or whatever is necessary. I can push the change to my branch if people are interested in this style.

However, as long as ; is used, I think it would be more consistent with other constructs to allow multiple words. Currently match foo (foo bar baz; quux plugh) is a syntax error but it could just run bar baz. There is no ambiguity for where patterns end as they already need to be in parentheses when they use multiple words.

This is another possibility that I like. The only thing I'd change is adding =>. Previously, => was unnecessary because it would have been followed by {a block} for multiple words. Removing this requirement would result in readability issues without a separator like =>. While I'm not 100% certain, I believe this change would allow for foo bar => baz to be transformed to if {~ $__es_matchtmp foo bar} {baz}, i.e. no more ( ) enclosing multiple patterns as well as no more { } needed for multi-word commands unless you are executing multiple commands. This might generate another ( ) vs. { } discussion though. Again, the binder-style syntax feels like it needs a { } after it because other binders have it, though it might be completely unnecessary.

Multiple evaluation is probably the biggest problem which I should have anticipated before throwing around the idea of rewriting it. Using named variable is ugly but I don't know what would work better, so let's leave it at this. I think its name needs to be mentioned in documentation.

Bindings with for/let/local like x=$x currently exhibit a problem in the face of a retry exception thrown from a catcher because the old value of $x is reassigned, resulting in an infinite loop. This is not a bug in es, just a general "don't do this" problem. I just recently found this out myself after experimenting with catch @{throw retry} {match $__es_matchtmp (() {break})}. For this reason, I am now reconsidering the rewrite idea. It's amazing what kinds of problems you encounter just by adding a variable.

Speaking of documentation, I have changed the name to matchexpr since it will now be public, and it'll be marked as noexport like apid since the binding changed from let to local. I also have yet to add match to the "Syntactic Sugar" section. When I do that, it will probably add more than 1 line to the section, making it feel out of place compared with the other one-line de-sugared representations.

As I mentioned, the newly documented matchexpr will be bound with local rather than let. This is why:

fn tf v {if {~ $v 0} {result TRUE} {result FALSE}}

# TRUE TRUE
local (tmp = {foo} {bar}) {
  echo {foo} matches $tmp ? <={tf <={~ $tmp {foo}}}
  echo and in reverse ? <={tf <={~ {foo} $tmp}}
}

# FALSE FALSE
let (tmp = {foo} {bar}) {
  echo {foo} matches $tmp? <={tf <={~ $tmp {foo}}}
  echo and in reverse? <={tf <={~ {foo} $tmp}}
}

Ordinarily, ~ matches blocks just fine, but when stored in a let-bound variable, it stops working. Of course, I'm unaware of anybody matching blocks like this, but this will at least not result in a difference in behavior between if {~ {cmd} {cmd}} {echo foo} and match {cmd} ({cmd} {echo foo}) .

wryun commented 2 years ago

Unfortunately, I went to merge this and I, err, can't because I merged your other PR first (and trip.es is being treated as a binary file). If you could either fix the conflicts yourself or grab the merge commit from:

https://github.com/wryun/es-shell/commit/82f1bc0ae9ec8ec9b43e38688d83ea349d207127

Then I can click the button. I know, I could have just merged that myself, but it's nice to have the github PR flow for clear documentation.

memreflect commented 2 years ago

Merged 82f1bc0 to my branch as requested :)

wryun / es-shell

Add switch statement #36