Closed Pascalio closed 1 year ago
Hi @Pascalio
That's intentional. Curly braces are essentially string literals to make it safer inlining JSON and/or sub-shells (re-reading the docs on this, they suck bad: https://murex.rocks/docs/parser/curly-brace.html sorry I'll add that to the todo). Whereas the square brackets have no special meaning at the parser level.
If you want to build some JSON up programmatically then you'll need to wrap those braces around some parentheses: https://murex.rocks/docs/parser/brace-quote.html.
» set foo = bar
» set struct = ({"foo": "$foo"})
» $struct
{"foo": "bar"}
Sorry, I should add, the reason behind that design decision is because having curly braces as string literals means that each sub-shell can lazy-load each curly brace block of code without the programming having to worrying about escaping variables.
eg
let i=0
while { = i < 10 } {
out $i
let i++
}
^ this would just print a column of zero's if the curly braces weren't treated as string literals because the parser would have expanded $i. Which would mean the developer would need to escape it. But when code starts getting nested you'd need to figure out how many times you need to escape....and well it just becomes a mess.
So by making them string literals you solve that problem but you then need to quote the braces if you want to override that string literal behavior.
The same problem doesn't exist with square brackets, but you raise an interesting question....should that also be treated as a string literal?
Thank you @lmorg ! I'm starting to understand. This raises me a lot of questions... I'll mull them over before posting daft ideas and come back to you.
I'd definitely love to hear your thoughts, whenever you feel ready.
@Pascalio Any update on this? Or are you happy for me to close?
Hi @lmorg ! I'm so sorry, this topic took me on a grand tour of further questioning... Which took a lot of time and hesitation.
To come back to the reported problem, the solution of wrapping the whole thing in bracket-quotes does work and maybe I'd better stop the fuss there. However, it is a bit sad that such a fundamental feature as the ability to dynamically create associative arrays or complex objects (my attempt with json here) doesn't feel native. We have to resort to a hack, to creating the string from bits in order to have our object. I have thought of many proposals to resolve this lack of native support, but in the end, I keep stumbling on a certain limitation of the original murex design... I say this with humility, as I've never endeavoured to create my own language, so please bear with my potentially naive views.
Though a paradigm of shell languages, I reckon the notion of "string expansion" is faulty. A new language should aim for unconditional coherence, so that a piece of grammar will have the same value in any context. If "this" does "that" "here", then it does "that" everywhere. In this way, it is more memorable, feels worth learning, and becomes composable in patterns. The problem with string expansion is that it's close to what a pre-processor does in a compile language. But as a programmer, you want variables, not pre-processor-ish substitutions. That is, in your example, we'd clearly need an "access to variable" rather than "string expansion".
let i=0 # variable set
while { = $i < 10 } { # value of variable accessed
out $i # value of variable accessed at the moment of execution
let $i++ # value of variable mutated at the moment of execution
}
^ in this example, the "$" was used to say "now access the variable behind the given name, don't use the name itself". In this way, curly braces don't need to be string literals. Another way to put it, would be to say that variables are expanded ephemerously at the moment of command execution. I don't know if such a change of paradigm makes sense for murex, or at all... But I think it would resolve many grammar issues. Like:
let n = 1
= $n + 1
# 2, not 11 and no need to explain why some surprising result would happen
set key = foo
set object = {"$key": $n} # we can support objects "natively"
What do you think?
You do make some excellent points. I particularly like the analogy to a macro parser.
let n = 1 = $n + 1 # 2, not 11 and no need to explain why some surprising result would happen
This is actually one of my biggest bugbears with the language. It frequently trips me up so I can completely relate why you raise it as being ugly. The history behind that is partly because everything is a function and partly because I got lazy and imported a module to do math notation (in the early days when murex was a POC, I wanted to focus on what brought value to the code. But I never got round to replacing that package with one that works more natively with the murex syntax).
There is an open ticket for me to resolve this (https://github.com/lmorg/murex/issues/134) however I've never allocated any time to it because the status quo was "good enough". Given your comments, I'll re-prioritize that work.
set key = foo set object = {"$key": $n} # we can support objects "natively"
Unfortunately this is a little trickier to get right because it could introduce breaking changes to existing scripts. That isn't to say I'm against making a change, just that the syntax would need to be explicit rather than implicit so that we mitigate the risk of silently breaking running code.
My immediate first thought is we change the syntax up so not to use set
at all. In an ideal world that would mean something like: object = {"$key": $n}
. Though I'd need to rewrite the parser before I could support that (I'm not against rewriting the parser, that too is POC code that never matured further because it was "good enough").
This would mean that something like:
set $key = foo
let $n = 0
out json: {"$key": $n}
# would still produce the literal string: {"$key": $n}
out json: ({"$key": $n})
# would produce: {"foo": 0}
json = {$key: $n}
out json: $json
# would produce the object: {"foo": 0}
This could even be expanded to understand the difference between strings and numbers. eg
key = "foo" # string because quoted
n = 0 # number
(just like how normal programming languages work hehe)
This would allow us to deprecate set
and let
without breaking backwards compatibility.
However this is a pretty substantial piece of work though. So I might have to introduce it in stages, depending on how eager yourself and others are to review the changes. So I could in theory include the string interpretation and variable behavior as a new builtin (eg var
or exp
) while I rewrite the parser to allow support for this as native behavior.
(that said -- and I'm "thinking out loud" here, it might not be a fundamental change to the parser as I believe because murex already has a concept of special parameters that modify the behavior of each command. So this could piggyback on that pattern...)
Thanks again for your feedback. As you've probably guessed, almost all of murex's design and development has been done by myself so there will be plenty of places where I've simply not spent enough consideration on. Hearing other peoples thoughts enables me to write better code and build a better shell, not only for yourself but myself too. So feedback like this is really valuable. Thank you
Thank you for your welcoming open-mindedness and responsivity! It's great!
Unfortunately this is a little trickier to get right because it could introduce breaking changes to existing scripts. That isn't to say I'm against making a change, just that the syntax would need to be explicit rather than implicit so that we mitigate the risk of silently breaking running code.
Yes, you're right, backward compatibility is important. Your idea of transitioning by leaving set/let as they were and implementing a new syntax sounds clever. Especially since the new one is more concise.
set key = foo
let n = 0
out json: {"$key": $n}
# would still produce the literal string: {"$key": $n}
So, this ^ would mean that the magic only happens in the var_name = data
syntax, but otherwise in the code, $var_name wouldn't be a "variable access", right? Or would it mean that the {}
chars have the ability to block variable access? Anyway, maybe I'm doing linguistical nitpicking here...
More importantly, and along the lines of your suggested syntax, I can also suggest another approach.
Currently, arrays are represented as "item1\nitem2\n...." which isn't handy to create.
my_array = apples\nbananas\ncoconuts
Fortunately, there's a command to ease things. This is nice, but again, doesn't feel like array creation is "natively supported". Practically, it means we can't
my_array = a: [apples,bananas,coconuts]
we must
my_array = ${a: [apples,bananas,coconuts]}
Why not have an alternative representation of arrays, so that we can directly assign them to variables? Something like:
[Item1, Item2, Item3]
(where strings containing " , " must be quoted...)
This would be alongside the current newline representation so that both
[apples, bananas, nuts] -> foreach: fruit {out: have some $fruit}
ls -> foreach: inode {rm $inode}
would work.
Then we could say that arrays are simply objects with auto-assigned numerical keys. [Hi, you] == [0: Hi, 1: you] And then reuse the same syntax for complex objects.
foo = scones
bar = clotted cream
grapes = [grape1, grape2, grape3]
healthy_lunch = [base: $foo, additional: $bar, "why not": $grapes]
We'd still have the "[]" command though.
$healthy_lunch -> ["why not"] -> foreach: grape {swallow: $grape}
And "{" would still prevent variable expansion/access.
Or have I gone just banana shake?
That could work. The ->[]
index is only usable as a method so it shouldn't be hard to say when []
is a function then generate an array, but when it's used as a method, filter instead.
I'd probably lean towards removing the comma (eg [grape1 grape2 grape3]
) though. The reason being that you don't comma separate parameters in shells, so it would keep the syntax more "shell like" while still providing a richer syntax.
Agree, it's better without the comma!
Yes, []
is a method when its stdin is piped, so it could be seen as a function when not, and then generate arrays. But then, if it's a function, we can't assign it directly to a variable, right? We must set var = ${ [ stufff ]}
. Can't set var = [ stuff ]
I can make var = [1 2 3]
work but that would be a special case.
As an aside, I really need to cut down on these special cases. The problem is backwards compatibility and the expectation for parameters to be barewords in shells.
re the comma, would it make sense to make the delimiter space and/or comma. So you can have LISP-like lists (eg [ apples pears oranges ]
) but also drop in raw JSON too (eg [ "apples", "pears", "oranges" ]
). I think that might hit the sweet spot convenience wise...
Yeah absolutely, special cases should be avoided as much as possible. That's why I suggested that arrays be "represented" like [1 2 3]
, and not that [
be a new command. Currenty, arrays are strings with newline chars, right? "1\n2\n3" is an array. If [1 2 3] is an array (too), then no need for a [
command and its special cases. Does it make sense?
Regarding JSON style, sure if you deem it clever and it doesn't get in your way. I was thinking the format json
command would instead be applied to our [1 2 3]
array to produce the json styled one, but maybe you have use cases I haven't thought of where it's better to create it in json style?
The problem is that shells accept bareword parameters. So how does one differentiate between
command [ this is a string ]
and
command [ this is an array ]
I guess could argue that quoting the first command forces it to become a string, eg
command "[ this is a string ]"
...and make it explicit in the docs that the shell does "best guess" interpretations of how a bareword should be?
Another option is to add a little syntactic sugar around objects being infixed. For example (and to borrow from Perl), %{}
is an object and @[]
is an array. This adds an extra character to what you envisaged but does make intent explicit.
Both options have their benefits and drawbacks so I'm undecided about which is the smarter option.
I was thinking the
format json
command would instead be applied to our [1 2 3] array to produce the json styled one, but maybe you have use cases I haven't thought of where it's better to create it in json style?
You wouldn't even need to format json
since []
(and {}
) expressions could natively output arrays and objects as JSON. In memory they would be an object but when converted to string (such as an process writing to STDOUT) they'd get converted into a JSON document.
My thinking about supporting commas was just to make copy/pasting easy. Eg you have some JSON from some text file and you want to quickly dump it into the terminal. Very much an edge case but it would be trivial to support both commas and white space.
As an aside, I've gotten the new expressions library mostly working now. It supports:
>
, ==
et al, plus I've added ~~
to support comparing similar data such as "5" ~~ 5
, "True" ~~ true
, "LOWERCASE" ~~ "lowercase")Support pending:
=+
et al (=
has been added so the rest of the work here will be easy)${}
to execute a command who's output is evaluated as part of the expression. eg bob = ${command}
The new expressions can be invoked either via
exp
command (this is only a stop gap while I test it)foobar = "foo" + "bar"
$()
eg echo $(2+2)
(planned but not yet written). The idea being ${}
executes a command and $()
executes an expression.My only concern with this is that the distinction between ${}
and $()
is obvious to anyone who hasn't read the source code...?
edit: these changes are currently sat on my dev machine. I haven't even pushed them to the develop
branch yet.
Thinking about the different parsers a little more, having %{}
and @[]
when in ${}
but {}
and []
in $()
seems very confusing. So that's a strong argument in favour of having arrays defined as barewords.
However that introduces a new problem. How do you differentiate between a code block as a parameter and a map/object as a parameter. eg
if { code block } then { code block } else { code block }
vs
command { "key": $value }
I might be panicking over nothing though. just trying to explore all angles before I accidentally create backwards incompatible tech debt :)
You're definitely right to think things through instead of being sorry afterwards.
For the reasons you underlined, I would favour your first solution:
cmd [ this is an array ]
cmd "[ this is a string ]"
However that introduces a new problem. How do you differentiate between a code block as a parameter and a map/object as a parameter. eg This would actually be avoided if maps and arrays share the same format, wouldn't it?
if { code block }
command [ key1: value1 key2: value2 ] # map
command [ 0: value1 1: value2 ] # explicitly-mapped array
command [ value1 value2 ] # implicitly-mapped array
command "[ value1 value2 ]" # just an old string
To answer your question about expressions: it isn't obvious to me who haven't read the code. But I could take an educated guess: commands are the very first item of a line, whereas an expression combines items with some combining symbol in between, not in the first position. Based on this, ${}
or $()
should be chosen. Right?
That said, I'm afraid my gut reaction, if I were discovering a language having this kind of syntax distinction, wouldn't be very positive. The reason is that I'm personally looking for a grammar that has simple building blocks which can be stacked and composed with flexibility, and it wouldn't be apparent at first glance why 1 + 2
is a totally different world from echo Hi
. Highly case-specific grammar is what put me off oilshell for example (together with it eating my bath towels...)
But then, in the end, it's all about pedagogy and conviction: I'm sure you have strong reasons supporting this, just believe in it, and a good explanation in the docs will content simplicity-seeking gut reactions like mine!
I've been thinking about this for a while and I believe there are a few insurmountable problems preventing the adoption of expressions as the primary syntax for this shell:
()
as an additional way to quote strings does save murex from the \\\\
hell that Bash (et al) suffer from.None of this is to say we cannot find a solution that compromises without feeling like it is making (undue) concessions. Just that the more I think about this problem, the more additional problems I find with any of the potential solutions. So I'm still very much at the research phase for "how to implement this the best way".
Anyway, sorry for the monologue. I wanted to get some thoughts jotted down while they were fresh in my head :)
I've had another thought. Not 100% sure I like this but putting it out there for consideration:
how about %
to build arrays and dictionaries?
If this is used in expressions it does lead to slightly uglier syntax, eg
» array = %[ foo bar baz ]
» dic = %{ foo: bar, hello: world }
but it would then allow commands/statements to adopt the same code:
» out %{ foo: bar, hello: world }
{
"foo": "bar",
"hello": "world"
}
So there isn't any mental switching between expressions and commands. %
would become a generalised "structure builders".
(edit: since the code is already written to support » foo = [ bar ]
(without %
) expressions, I could keep that around as an "undocumented feature", but the published standard, if we were to proceed this way, would be %[]
)
Additionally mkarray (https://murex.rocks/docs/commands/a.html) can be wrapped into @[]
too (basically anything inside @[]
that includes ..
as a bareword will be treated as a mkarray call)
This is perhaps the easiest way to bring your excellent points into reality without breaking backwards compatibility nor (hopefully) compromising too much on your desire for reducing cognitive overhead.
The big next problem is the remainder of the expressions. eg
# easy to identify as an expression because 2nd parameter is an `=`
» bob = 1 + 2
whereas
# do I want to output 3 or "1 + 2"?
» out 1 + 2
# is this assigning "/dev/sda" to $if?
» dd if=/dev/sda of=/dev/sdb
# is this assigning 2 to $maxdepth?
» find / -maxdepth=2
I think this is solvable with a lot of parser logic because
-
is an invalid variable name (so -maxdepth
would be an invalid token),/dev/sda/
would be an invalid token)But then there's the issue of whether out 1 + 2
should be valid?
I could argue that spaces in expressions when used in commands are invalid. But that introduces a number of new issues:
» bob = 1 + 2
is now valid but » out 1 + 2
is not. When is it safe to add whitespace and not safe to?mod
operator (typically %
) then I cannot because while » bob = %[]
is explicit, » out bob=%[]
is not.Another option is I just say assignments are not allowed in commands. Which solves all the instances where =
would be embedded. And to be honest it also potentially saves future murex users from reading back weird shell code where parameters have side effects with variables.
However parsing "whitespaced" expressions as parameters is still not an easy problem. eg:
» command foo 1 + 2 bar
...should 1 + 2
be parameter 3
for command
? It's pretty explicit that foo
is the first parameter and bar
is the last. But less clear how 1 + 2
should be interpreted.
So there's still an argument for disallowing whitespace in expressions in parameters.
(edit2: another option is that the same ${}
subshell is used to embed expressions. eg » out ${1 + 1}
. I'd then need to write support so that all expressions can be used (eg » 1 + 1
instead of having a special exception for » result = 1 + 1
formats, but that is a lot easier than solving all of the issues above and would result in a lot more uniform syntax)
A lot to think about, but it feels like we're inching towards a design that works :)
Thank you for the extensive explanations! I've read your article on split personalities and things are getting clearer. :)
2. murex already supports bareword strings. The original version of murex (around a decade ago, and before it was even called "murex") had syntax a lot closer to what we're trying to emulate here but it also requires executables to be encapsulated in parenthesis and strings to be quoted. This made for a really crappy interactive shell. As I made concessions (eg making quotes optional) the design slowly became more akin to what we have now and how more traditional shells work. Ultimately there is a trade off between the scripting side of things and the REPL (funny enough I wrote about this last year: https://murex.rocks/docs/blog/split_personalities.html)
Indeed, current murex makes for a much more pleasant REPL. (One of the most pleasant ones too imo!)
3. Parenthesis used as quotation marks. In hindsight this is one of the kludgier ideas I've implemented as it now limits some of the easier solutions (ie "just wrap expressions in parenthesis like you would naturally do in expressions anyway). However having a quotation mark that is different symbols for opening and closing a string does bring some genuine enhancements since having nested quotes is a depressingly common problem in shell scripting. So using
()
as an additional way to quote strings does save murex from the\\\\
hell that Bash (et al) suffer from.
I love parentheses as quotation marks. They're indeed much clearer than ". But yeah, if '()' denotes a string, then it can't denote an expression.
Your very last option feels very good to me: it is high-level and unified. So, if I understand correctly:
1 + 1 # 2
command ${command}
command ${expression}
result = 1 * 0 # direct assignment
# What about commands output capture on the right side of "="?
content = cat file -> head # or
content = ${cat file -> head} # ?
# Just wondering, I don't have any opinion about this at the moment.
cat file -> head -> (set) content # remains one of murex's greatest syntax in my view!
And yes, I'm not sure about the benefits of variable assignment within command calls... But given the dangers and downsides, you'd be right to disallow them, I think. Variables should always be assigned or mutated in a very visible way.
As to %, yes, it sounds good. As long as we can do
foo = "bar"
arr = %[ baz $foo ]
dict = %{ key1: $arr, $foo: booze }
Additionally mkarray (https://murex.rocks/docs/commands/a.html) can be wrapped into
@[]
too (basically anything inside@[]
that includes..
as a bareword will be treated as a mkarray call)
Didn't get you there. You mean we can't %[ 1..5 ]
but we can %[ @[1..5] ]
Another daft question: why does % work where single [ didn't? Not a rhetorical "upset" question, it's just that I'm not sure to understand everything. :)
Is it because
cmd [array]
would be ambiguous with cmd "[string]"
?
Or is it because cmd {dict}
breaks { literality and cmd [dict]
isn't a good option?
@Pascalio
Indeed, current murex makes for a much more pleasant REPL. (One of the most pleasant ones too imo!)
Thank you <3
I love parentheses as quotation marks. They're indeed much clearer than ". But yeah, if '()' denotes a string, then it can't denote an expression.
I'm tempted to say ()
for strings should be %()
, to bring it in line with %[]
for arrays and %{}
for objects/maps/hashes/dictionaries...whatever term you prefer. (in the source code I refer to them as "objects" but that might be confusing due to how it's understood in object orientated programming. So might also take this opportunity to come up with a more sensible term to use consistently in the code and documentation).
Your very last option feels very good to me: it is high-level and unified. So, if I understand correctly:
You understood perfectly in those examples :) In fact that code is now working:
(should be pushed to the develop
branch shortly but still very beta!)
What about commands output capture on the right side of "="?
content = cat file -> head # or content = ${cat file -> head} # ?
It would have to be the latter because the former would introduce too much:
So you do still end up with a two tiered language syntax (which will need to be documented clearly) but at least you don't need to worry about which is the correct symbol to invoke expressions vs commands. you can just treat them as part of the same language albeit different nuances within that language. Though happy to take any further opinions into consideration.
As to %, yes, it sounds good. As long as we can do
Certainly can :) I haven't yet written the parser for dicts yet though.
Didn't get you there. You mean we can't
%[ 1..5 ]
but we can%[ @[1..5] ]
I mean %[ 1..5 ]
will (eventually, haven't yet written the code) be supported so you're not having to nest blocks like %[ @[1..5] ]
. In fact %[]
will become the de facto way to call a
in the future.
Another daft question: why does % work where single [ didn't? Not a rhetorical "upset" question, it's just that I'm not sure to understand everything. :)
That's a fair question.
There's a few reasons behind that decision:
{}
is a string literal and was designed that way intentionally to make parsing shell scripts easier where you have nested code blocks. (eg function foobar { out $foo }
, I didn't want to prematurely expand the variable inside the function
). This made sense when I was rapidly knocking up this shell as a proof of concept ~10 years ago and it could be argued that the shell has now out grown that requirement. But if I can avoid having to spend a few weeks rewriting the main parser and all its tests then I'd prefer that. At some point that piece of work is going to need to happen anyway, but for now there's much more low hanging fruit that i can spend my time on (like your other suggestions, as it happens)[]
for both dicts and arrays. I think it's not very clear if [ 1: 2, 3: 4 ]
is an array or dict. At least not when glossing over code. Also since arrays and dicts are displayed as JSON when converted to a string. It makes sense to have their syntax loosely match JSON too.[ filter ]
is already a command. So it would mean you could only use []
for arrays if you were to assign (rather than use them as a function), eg arr = [ 1 2 3 ]
would create an array but [ 1 2 3 ]
would filter it. Which is rather hard to reason about at a glance.%
as a default prefix for "creating new things" becomes explicit. eg %[]
creates an array, %{}
creates a dict, and %()
creates a string. It should then (hopefully) be clear at a glance what these brackets are doing.That's my thinking anyways.
Added support for dicts %{}
Added support for %[[..]]
(currently needs to be double square brackets because a
(et al) actually behave that way, eg
a: [1..3]bob
# Outputs
# 1bob
# 2bob
# 3bob
would be the same as %[[1..3]bob]
but I plan on fixing adding a little extra logic that allows of %[..]
as a shorthand for %[[..]]
in future patch)
Added patch to support %[..]
(as per https://github.com/lmorg/murex/issues/485#issuecomment-1339967448)
I'm pretty close to having a this ready for a release. Though I want to keep it in develop
for a couple of weeks longer to beta test further.
Hi @lmorg ! Sorry for the silence, I was off on the other hemisphere for a few days... This looks wonderful! I wish I could contribute with coding and not only opinions, but my go is a bit of a no-go...
I'm tempted to say
()
for strings should be%()
, to bring it in line with%[]
for arrays and%{}
Sounds good.
objects/maps/hashes/dictionaries...whatever term you prefer.
Looks like you've gone for "dictionaries". I don't have any opinion about this... Except that "map" is shorter. But dictionary is certainly fine.
- so having
%
as a default prefix for "creating new things" becomes explicit. eg%[]
creates an array,%{}
creates a dict, and%()
creates a string. It should then (hopefully) be clear at a glance what these brackets are doing.
Good point, things are clearer this way.
I'm anxious to try the new code. Will maybe try to compile from develop
.
@Pascalio
This looks wonderful! I wish I could contribute with coding and not only opinions, but my go is a bit of a no-go...
Honestly, the feedback you've provided has been invaluable. Worth just as much, if not more so, than any code pull requests.
I'm anxious to try the new code. Will maybe try to compile from develop.
Please do give develop
a try. I've just dropped a massive update into develop
which sees the main parser completely rewritten to integrate the new syntax changes (so the expressions isn't just an adhoc bolt on).
So there may well be some breaking changes where I haven't entirely copied across the parsing rules correctly (the test suite has caught most of the problems but worth running the dev build for a little while to see if any others pop up).
I haven't (yet) rewritten the fuzzers for the new parsers so there might be some edge cases that can cause a crash. That's also on my TODO.
Line numbers will definitely be reported wrong though. That's still a work in progress.
I also need to get the documentation updated.
Brilliant! I'll give it a good try, thank you!
Awesome, thanks @Pascalio . I'm pushing updates pretty much daily at the moment so if there's an issue you've ran into there's a chance I might have already fixed it.
I'm going to aim for a release on ~1st Jan. So let me know how you get on. And if you don't get time before then, then that's fine. I appreciate how busy every one is. Particularly at this time of year.
Season's greetings @lmorg ! Have just grabbed the develop version. Everything's smooth so far. I'll be reporting in separate reports if needed!
Describe the bug: It seems variable expansion is not performed within {}... using version v2.11.2200 on Linux.
set foo = bar
set struct = {$foo}
$struct
{$foo}Expected behaviour: Typing
$struct
should yieldbar
, right?Screenshots:
This is a problem if you want to create json data from variables... Apart from concatenating strings then casting to json, is there any other way to create:
from
$foo
?