Closed memreflect closed 2 years ago
Neat!
However, I'm not sure about this. It feels like it's not quite in es's minimalist functional spirit. Also, if we were to implement switch, I think I'd prefer use of program fragments (i.e. {} - as in 'if'). I know rc doesn't do this, but rc actually has syntax for while/if rather than treating them as functions.
Maybe there's a sweet spot between this and the function in #31 ? Not that I can think of one off the top of my head.
to me a specialized switch
makes sense because it'll be made to handle patterns correctly, in a first-class manner, vs resorting to wrapping patterns in strings then eval
-ing them. It'll also have more uniform syntax with the rest of the language (vs relying on a list). And it'll be able to restrict the shape of the code inside its body (versus a regular function where anything (incorrect) goes).
My little function doesn't even handle multiple (or-)patterns in the same case field. it has to eventually be one string and nothing more.
Having thought about this a little more, I think I'd prefer a switch that was special in a different way. e.g.
fn switch input fragment {
local (fn case args {
if {~ $input $args} { # stupid trick here - fragment won't pattern match
$args($#args)
}
}) $fragment
}
switch bar {
case foo {
echo Got foo!
}
case bar {
echo Got bar!
}
}
This sort of works, but doesn't actually handle patterns properly (and doesn't have the usual abort on first match behaviour). Part of the issue here is that it's hard to parameterise patterns - i.e. something like this fails:
pattern=fo*
echo <={~ foo $pattern}
(and would fail for the same reasons in the above switch)
As far as I know, there's currently no way around using eval
as @hyphenrf did in #31 . But maybe there's some neat way we can change ~
so that this works (though I guess you still run into the filename expansion issue). Really just addressing #32 , I guess.
Another thought would be to add a new case pattern matching 'thing' which is basically just the case function above but uses some hard-coded var ($case?) and does the early return and proper pattern behaviour. But I guess that's doing as much or more syntactical violence as this proposal :)
that's an interesting scoping rule with local
being allowed to appear in function arg on call site.. is it elaborated somewhere?
It's not elaborated, but manual does explicitly say that fn name parameters { commands }
is equivalent to fn-name = @ parameters { commands }
. If you take that literally, it follows that it should be usable in binders.
it is not the case for let
-bindings though.
as a minimal example:
fn ex-1 body { local (fn log { echo $body being executed }) $body }
fn ex-2 body { let (fn log {echo $body being executed }) $body }
if we do ex-1 { log ls; ls }
, we find that log
is in scope, but if we do ex-2
instead, we get an error (as one would expect with scoped bindings).
this behaviour with local
is very cool for defining little DSLs like switch
. It'd make a lot of sense to highlight it!
btw, @wryun, this is a cool solution you came up with. I took it and added the usual eval
& break
stuff so that I can use it in my code, and also as a tmp solution until an agreed-upon implementation of switch statement is included in the language.
# usage: switch expr {
# case lit single-cmd
# case 'pat0*' { ... } # <- quotation needed, blocks too
# case 'pat1*' 'pat[0-9]' { ... } # <- because we have or-patterns
# default { ... }
# }
fn switch match cases {
local (
fn case args { eval \~ $match $args && $args($#args) }
fn break args { throw break <={~ $#args 0 || result $args} } # optionally alias break to throw break 0
fn default args { $args }
)
catch @{if{ ~ $1 break }{ result $*(2 ...) }{ throw * }}{ # optionally propagate break's payload
$cases
}
}
Nice! I briefly tried to mess with $&noreturn
but couldn't get it to do what I wanted.
(I apologize for the amount of information below in advance.)
Having thought about this a little more, I think I'd prefer a switch that was special in a different way. e.g.
... switch bar { case foo { echo Got foo! } case bar { echo Got bar! } }
I agree that syntax is much more in line with the rest of es. I've pushed changes for similar syntax to my switch-statement branch. However, there are some differences you may or may not agree with:
Multiple cases may still be specified in a single line in the familiar cmd1 args; cmd2 args
style, but only 1 semicolon ;
is allowed after the closing brace }
for a case body:
switch foo {
case bar {echo bar}; case foo {echo foo};
case * {throw error example no matches};
}
Only case A B C ... { ... }
items are allowed inside a switch, so the full cmd1 args;;; ; ;; ; cmd2 args;;; ;; ;
syntax is unnecessary and difficult to read. I expect interactive use of switch commands will be exceedingly rare, but there's no reason to require newlines between cases in an interactive shell, and the semicolon as a command separator makes sense if one considers each case to be similar to a function (but without wildcard expansion).
Cases consisting of a single "word"—including expressions like (1 2 3)
, $listvar
, and `{command args}
—remain unchanged, but multiple patterns in a single case must be specified by enclosing them in parentheses. This avoids a problem where the case keyword adjacent to the preceding case block would result in the new case keyword being one of the patterns for the preceding case rather than triggering a syntax error:
switch foo {
# case (foo {echo ok} case *)
case foo {echo ok}case * {echo fail}
}
While this is a bit of an unimprovement, it is consistent with the (a b c) = ...
syntax used in variable bindings. After all, each case block is bound to its corresponding list of patterns.
The way I implemented the changes, single-word commands do not require braces, and my latest commit also made case bodies optional since matches implicitly return true/0 anyway:
fn is-even n {
switch $n {
case () {throw error is-even expected one or more values}
case *[02468] # true
case * false
}
}
The behavior can easily be changed so that braces are required, implicitly forbidding an empty case body.
Incidentally, the "case" keyword is now unnecessary, so it could be removed. After a couple of minor adjustments, its removal would mean one less keyword and no need to repeatedly type "case". It also would look remarkably similar to the case X in ... esac
syntax used in sh, bash, etc.
# Display file type
switch $filename {
(*.a *.lib) {echo library}
*.c {echo C source}
*.h {echo C/C++ header}
(*.o *.obj) {echo object}
(*.so *.dll) {echo shared/dynamic library}
(*.C *.cc *.cpp *.cxx *.c++) {
echo C++ source
}
(*.H *.hh *.hpp *.hxx *.h++) {
echo C++ header
}
* {throw error ftype unknown file type}
}
A less powerful but very similar construct in R6RS (Scheme) and Racket is also called "case", so perhaps a name change is worth considering if this syntax is adopted.
I want to note that this makes ~
obsolete and ~ subj pat ...
should probably be rewritten to something like switch subj (pat ...) true * false
(or whatever syntax it ends up having) rather than being left as a separate construction.
I think it's good to have both (and ~~
) though. It's certainly lighter to write if {~ a b} c d
than to write switch a { case b c; case * d }
. The readability benefits only start showing when you're doing multiway logic on a variable. So each has its own use-case I believe.
I wouldn't want to write a quick throwaway expression like this throw break <={~ $#args 0 || result $args}
in terms of switch
@memreflect question: with the way you handle multiple patterns are the following #1
and #2
semantically different?
e = (a b c)
switch pat {
(a b c) d #1
$e f #2
}
I want to note that this makes
~
obsolete and~ subj pat ...
should probably be rewritten to something likeswitch subj (pat ...) true * false
(or whatever syntax it ends up having) rather than being left as a separate construction.
The proposed switch
command currently emulates the behavior of
if {~ subj pat ...} {
cmd
} {~ subj pat2 ...} {
cmd2
} ...
While you could replace the humble ~ subj pat ...
with a switch
, it would be the equivalent of rewriting the command in a more explicit form with if
:
# ~ subj pat ...
if {~ subj pat ...} true false
For that reason, I feel ~
should be retained even if a switch
command is added in some form.
@memreflect question: with the way you handle multiple patterns are the following
#1
and#2
semantically different?e = (a b c) switch pat { (a b c) d #1 $e f #2 }
No, #2
is effectively redundant. Assuming the case
keyword is unnecessary, that code would behave the same as
e = (a b c)
if {~ pat (a b c)} {
#1
d
} {~ pat $e} {
#2
f
}
Since $e
and (a b c)
are equivalent, #1
would be executed if pat
matched, else nothing would happen. #2
will never be executed.
For that reason, I feel
~
should be retained even if aswitch
command is added in some form.
I don't postulate entirely removing ~ from language. No one wants to type lengthy keywords all the time. But just as you wrote, switch is simply an if with ~; it completely covers ~'s functionality. So I would like to see one being considered a syntactic sugar of the other (I don't care which way) and rewritten during parsing the same way most other constructs are. It can't be written directly in es because it needs to treat arguments specially, but intuitively it shouldn't need a separate node kind.
That being said, many things already aren't perfectly orthogonal and it's not a showstopper.
I don't postulate entirely removing ~ from language. No one wants to type lengthy keywords all the time. But just as you wrote, switch is simply an if with ~; it completely covers ~'s functionality. So I would like to see one being considered a syntactic sugar of the other (I don't care which way) and rewritten during parsing the same way most other constructs are.
Sorry for the misunderstanding. I don't believe rewriting ~ in terms of switch is the best idea. If I understand your suggestion however, converting switch into an if with ~ commands is completely reasonable. I haven't pushed any changes, but does the example below illustrate your intent?
# input
# ↳ output from es -x to illustrate the rewrite
switch () {}
↳ {if}
switch foo {}
↳ {if {~ foo }}
switch foo {case () false}
↳ {if {~ foo } false}
switch foo {case (x y z) false}
↳ {if {~ foo x y z} false}
switch foo {case (x y z) false; case fop {}; case * {result 42}}
↳ {if {~ foo x y z} false {~ foo fop} {} {~ foo *} {result 42}}
but does the example below illustrate your intent?
Yes, that's the idea. And you are right that it's better in this direction.
Thanks @memreflect - this looks good to me.
@mwgamera if you have a chance to look at this PR, that'd be great. It looks sane to me, I much prefer the 'rewrite as if' that you suggested, and though my instincts are against adding new things to 'es' at this stage (particularly new syntax) I'm leaning towards merging.
It will break existing scripts that use switch or case, but I'm hoping that's not too common (and how many heavy users of es are there anyway...).
The other thing that might be worth considering here is calling this match
instead of switch
, to be clear we're following more in the functional tradition instead of the C-style switch/case/break.
And potentially drop case entirely. e.g. https://doc.rust-lang.org/reference/expressions/match-expr.html
So:
res = <={match $var (
foo {
result bar
}
bar {
result foo
}
)}
I've also changed the argument to match here to make it a list. This means we're closer to something that can be transformed without additional syntax shenanigans, though currently we still need it because of the special pattern handling.
Thoughts?
It will break existing scripts that use switch or case, but I'm hoping that's not too common (and how many heavy users of es are there anyway...).
A strongly worded changes entry about the breaking introduction of a new kw should cover most people interested in updating :P
I've also changed the argument to match here to make it a list. This means we're closer to something that can be transformed without additional syntax shenanigans, though currently we still need it because of the special pattern handling.
I'd rather it be curly braces to indicate the special-form aspect of the expression, speaking for myself only of course. If it's just list-like but breaks the expectations of what something like (a* bx)
means, I think that'll make the language appear less homogeneous.
@hyphenrf - re:
I'd rather it be curly braces to indicate the special-form aspect of the expression, speaking for myself only of course. If it's just list-like but breaks the expectations of what something like (a* bx) means, I think that'll make the language appear less homogeneous.
I think () makes the language more homogeneous than {} in this case, because every other instance of {} acts as a program fragment or similar, and can be read as 'normal' code. Doing match {}
means that the contents of what looks like a fragment are now special.
Conversely, it's already valid es (provided a match function is defined) to do:
match $var (some list stuff)
As you point out, what we're proposing here does change the interpretation of what's in this supposed 'list' in a surprising way, but I think the program fragment one is more surprising. I think the closest parallel is the for statement:
for (var = list) command
Here we use ()
but interpret it differently due to context. But currently the 'rules' are never broken for {}
.
@memreflect by the way, I'm impressed you're making all these changes so quickly, and I'm not meaning to push you around to many different solutions, particularly when I don't feel like we have consensus yet. Don't feel like you have to follow my latest whim, and feel free to disagree!
Ah, @memreflect , I (belatedly) realised that this kind of rewrite evaluates multiple times. e.g.
{match `{echo foo} { bar { echo 1 }; foo { echo 2 } }}
is transformed to:
{if {~ <={%backquote <={%flatten '' $ifs} {echo foo}} bar} {echo 1} {~ <={%backquote <={%flatten '' $ifs} {echo foo}} foo} {echo 2}}
I think we should either flip the rewrite (i.e. rewrite ~ to match, and implement match) OR rewrite match with a special let variable so we only evaluate once (don't know if there's prior art in es
for this, though).
i.e. something like:
match `{echo foo} (bar true foo false)
would rewrite to:
let (__es_match=`{echo foo}) {if {~ $__es_match bar} true {~ $__es_match foo} false}
I have a slight preference to not require ;
, though I guess it is used in bind statements... (for/local/let). Ok either way
EDIT: actually, now I like the parallelism with the bindings. I'm tempted to go all in and suggest completely copying rust's match by adding a '=>'. Your call, and I will stop commenting now.
The other thing that might be worth considering here is calling this
match
instead ofswitch
, to be clear we're following more in the functional tradition instead of the C-style switch/case/break.
Agreed and implemented.
And potentially drop case entirely. e.g. https://doc.rust-lang.org/reference/expressions/match-expr.html
Done.
So:
res = <={match $var ( foo { result bar } bar { result foo } )}
I've also changed the argument to match here to make it a list. This means we're closer to something that can be transformed without additional syntax shenanigans, though currently we still need it because of the special pattern handling.
Thoughts?
I'm not too attached to the block syntax. One could argue the block syntax admittedly no longer makes sense without a case
keyword because suddenly no commands are executed inside that block; with a case
keyword, it at least felt like there was a command, even if case ...
was the only command allowed. A change to list syntax would result in a resemblance to existing binding constructs like let
, so there is clear precedent in favor of ( )
rather than { }
:
match $var (
(pattern1 pattern2) {cmd}
pattern3 {another-cmd}
...
)
# match $subject (pattern... {block}; pattern... {block}; ...)
I will say the change to parentheses can make matching the empty list appear odd in an inline match
command. Then again, I can't remember the last time I typed a case...esac
command in sh/bash/ksh directly on the command line either, so it probably won't be seen often, unlike ~
:
match $var (() {echo unset}; '' {echo null}; * {echo $var})
Other than that oddity and the fact that it feels like a block is missing at the end, the parentheses are fine. Your rationale for using parentheses instead of braces makes sense. Making braces a special case for a single command would be a problem.
Ah, @memreflect , I (belatedly) realised that this kind of rewrite evaluates multiple times. e.g.
{match `{echo foo} { bar { echo 1 }; foo { echo 2 } }}
is transformed to:
{if {~ <={%backquote <={%flatten '' $ifs} {echo foo}} bar} {echo 1} {~ <={%backquote <={%flatten '' $ifs} {echo foo}} foo} {echo 2}}
I think we should either flip the rewrite (i.e. rewrite ~ to match, and implement match) OR rewrite match with a special let variable so we only evaluate once (don't know if there's prior art in
es
for this, though).i.e. something like:
match `{echo foo} (bar true foo false)
would rewrite to:
let (__es_match=`{echo foo}) {if {~ $__es_match bar} true {~ $__es_match foo} false}
That is a good point. In my enthusiasm, I missed that potential performance issue. Fixed by assigning to temp variable bound with let
as suggested.
I have a slight preference to not require
;
, though I guess it is used in bind statements... (for/local/let). Ok either wayEDIT: actually, now I like the parallelism with the bindings. I'm tempted to go all in and suggest completely copying rust's match by adding a '=>'. Your call, and I will stop commenting now.
I like the parallelism with bindings as well, especially since the commands are technically bound to the patterns preceding them (sort of like let ({cmds} = patt1 patt2 ...) {...}
). Of course, that parallelism suggests there should be a block at the end as I already mentioned, though I can't think of a good reason to add this (what would it do?), not to mention there's the question of whether such a block would affect the result returned by <={match ...}
.
I'm not sure adding =>
would have any benefit other than readability, which can usually be solved with alignment of the various blocks or splitting the block into multiple lines with proper indentation.
Btw, my latest commit allows you to separate the opening parenthesis '(' from the line the match subject is on. Since match
is looks so much like a binder now (for/let/local), I figured I may as well go ahead and make it behave more like one.
I like match
but ~
is also called match in few places so it might be confusing. But the documentation doesn't call it that so hopefully it won't be a problem. Note, though, that it's nothing like the match construct in functional languages or in Rust which allows extracting data from complex structures rather than glob-matching strings. I don't think =>
would add anything either.
As for the syntax, I don't have a strong opinion on it. I think the most obvious would be to have match target pat1 cmd1 pat2 cmd2 ...
without any brackets, but I realize this would be extremely inconvenient to use on multiple lines. XS had a switch
function like that, but it was a regular function without pattern matching so it could be bracketed ad lib.
All in all, the current version looks okay to me.
However, as long as ;
is used, I think it would be more consistent with other constructs to allow multiple words.
Currently match foo (foo bar baz; quux plugh)
is a syntax error but it could just run bar baz
. There is no ambiguity for where patterns end as they already need to be in parentheses when they use multiple words.
Multiple evaluation is probably the biggest problem which I should have anticipated before throwing around the idea of rewriting it. Using named variable is ugly but I don't know what would work better, so let's leave it at this. I think its name needs to be mentioned in documentation.
The parallelism with binder together with the use of a named variable really just suggests solution similar to what wryun wrote at the beginning: case pat cmd
could be defined as if {~ $__es_matchtmp pat} {return <=cmd}
and switch (subj) block
as @ __es_matchtmp block subj
. But that would be a bit quirky (kind of like writing Perl) and I think I came to like the current proposal a bit more.
One thing that could be useful is marking the match variable with the "primitive" sigil, like some special variables already are.. thus making it impossible (?) to unintentionally construct at userspace. I can already name variables collect
for example without worrying about interference with &collect
How would that play along with the language though, I have no idea.
@wryun that is a fair argument you raise on the curly braces. I'm convinced
One thing that could be useful is marking the match variable with the "primitive" sigil, like some special variables already are.. thus making it impossible (?) to unintentionally construct at userspace. I can already name variables
collect
for example without worrying about interference with&collect
How would that play along with the language though, I have no idea.
There is no precedent for this, so I'm not sure this is a good idea. You can think of primitives as functions with names that cannot be reassigned. Also, plain variables like $apid
exist and can be assigned values, so I don't think there's a need to make a special case just for one variable that disappears at the end of the match command.
I like
match
but~
is also called match in few places so it might be confusing. But the documentation doesn't call it that so hopefully it won't be a problem. Note, though, that it's nothing like the match construct in functional languages or in Rust which allows extracting data from complex structures rather than glob-matching strings. I don't think=>
would add anything either.
~
is not like anything in any other language I've used, functional or otherwise, so match
is already different because of that.
As for the syntax, I don't have a strong opinion on it. I think the most obvious would be to have
match target pat1 cmd1 pat2 cmd2 ...
without any brackets, but I realize this would be extremely inconvenient to use on multiple lines. XS had aswitch
function like that, but it was a regular function without pattern matching so it could be bracketed ad lib. All in all, the current version looks okay to me.
Well, I somehow managed to get the following syntax working, which might be more desirable than the binding-like syntax, and the ;
separators are gone:
match foo (bar baz) {
echo bar baz
} (quux plugh) {
echo quux plugh
} f* {
echo starts with f
} {
echo no matches
}
Unfortunately, the ( )
enclosing multiple patterns is still a requirement, but I'm experimenting to see if that can be changed without sacrificing the ability to make match foo bar true false
equivalent to if {~ $__es_matchtmp bar} {true} {false}
. For that reason, this new syntax is not available yet.
Edit I was clearly not thinking. match foo bar true false echo
is horribly ambiguous if this were allowed. As a result, (bar true)
or (bar true false)
or whatever is necessary. I can push the change to my branch if people are interested in this style.
However, as long as
;
is used, I think it would be more consistent with other constructs to allow multiple words. Currentlymatch foo (foo bar baz; quux plugh)
is a syntax error but it could just runbar baz
. There is no ambiguity for where patterns end as they already need to be in parentheses when they use multiple words.
This is another possibility that I like. The only thing I'd change is adding =>
. Previously, =>
was unnecessary because it would have been followed by {a block}
for multiple words. Removing this requirement would result in readability issues without a separator like =>
. While I'm not 100% certain, I believe this change would allow for foo bar => baz
to be transformed to if {~ $__es_matchtmp foo bar} {baz}
, i.e. no more ( )
enclosing multiple patterns as well as no more { }
needed for multi-word commands unless you are executing multiple commands. This might generate another ( )
vs. { }
discussion though. Again, the binder-style syntax feels like it needs a { }
after it because other binders have it, though it might be completely unnecessary.
Multiple evaluation is probably the biggest problem which I should have anticipated before throwing around the idea of rewriting it. Using named variable is ugly but I don't know what would work better, so let's leave it at this. I think its name needs to be mentioned in documentation.
Bindings with for/let/local
like x=$x
currently exhibit a problem in the face of a retry
exception thrown from a catcher because the old value of $x
is reassigned, resulting in an infinite loop. This is not a bug in es, just a general "don't do this" problem. I just recently found this out myself after experimenting with catch @{throw retry} {match $__es_matchtmp (() {break})}
. For this reason, I am now reconsidering the rewrite idea. It's amazing what kinds of problems you encounter just by adding a variable.
Speaking of documentation, I have changed the name to matchexpr
since it will now be public, and it'll be marked as noexport like apid
since the binding changed from let
to local
. I also have yet to add match
to the "Syntactic Sugar" section. When I do that, it will probably add more than 1 line to the section, making it feel out of place compared with the other one-line de-sugared representations.
As I mentioned, the newly documented matchexpr
will be bound with local
rather than let
. This is why:
fn tf v {if {~ $v 0} {result TRUE} {result FALSE}}
# TRUE TRUE
local (tmp = {foo} {bar}) {
echo {foo} matches $tmp ? <={tf <={~ $tmp {foo}}}
echo and in reverse ? <={tf <={~ {foo} $tmp}}
}
# FALSE FALSE
let (tmp = {foo} {bar}) {
echo {foo} matches $tmp? <={tf <={~ $tmp {foo}}}
echo and in reverse? <={tf <={~ {foo} $tmp}}
}
Ordinarily, ~
matches blocks just fine, but when stored in a let-bound variable, it stops working. Of course, I'm unaware of anybody matching blocks like this, but this will at least not result in a difference in behavior between if {~ {cmd} {cmd}} {echo foo}
and match {cmd} ({cmd} {echo foo})
.
Unfortunately, I went to merge this and I, err, can't because I merged your other PR first (and trip.es is being treated as a binary file). If you could either fix the conflicts yourself or grab the merge commit from:
https://github.com/wryun/es-shell/commit/82f1bc0ae9ec8ec9b43e38688d83ea349d207127
Then I can click the button. I know, I could have just merged that myself, but it's nice to have the github PR flow for clear documentation.
Merged 82f1bc0 to my branch as requested :)
While the
~
command in es definitely reduces the need for aswitch
command, that need is not completely eliminated. There is likely a good reason rc has aswitch
command despite also having~
, and #31 also indicated it would be a beneficial addition. Taking those facts into consideration, I have implemented it with the same behavior as equivalent uses ofif
with the~
command. Additions to the trip.es file attempt to exhaustively ensure this.I'd also like to note that in contrast to the current
switch
syntax in Byron Rakitzis's rc, enclosing parentheses are not required for single-word subjects, nor is a redundant set of parentheses required for multi-word subjects:switch($subject) { case ... }
→switch $subject { case ... }
switch((foo bar)) { case ... }
→switch (foo bar) { case ... }
The new addition is documented in the manpage, which had some additional cleanup done thanks to the warnings from
mandoc -T lint
on FreeBSD, and the documentation for thebreak
command has been updated to reflect its lack of effect on aswitch
.