oils-for-unix / oils

Oils is our upgrade path from bash to a better language and runtime. It's also for Python and JavaScript users who avoid shell!
http://www.oilshell.org/
Other
2.85k stars 158 forks source link

ysh-simple parsing / let command language and typed args compose #1837

Open bar-g opened 9 months ago

bar-g commented 9 months ago

The idea in short:

This allows for immediate action, e.g.

for i, name, value in (NAMEDV) {...}

Explanation:

If the command mode (string based) could accept referencing multiple typed args in separate parens and brackets, there could simply be

Together with the CLI flags and options these arguments could be mapped automatically into positional arg lists.

I think there may be three different general ways to organize positional arg lists, and all have their merits and valid use cases:

# 1.) For iterating through the CLI call references (words),
#    with a Forth-like Quality (not against grain) (https://www.oilshell.org/blog/2017/01/13.html)
@ARGV    # list of words (command as dispatched, with flags, opts, args)
         # NOTE: containing complete typed-CLI that supports typed args, e.g.:
         # myproc --flag -f some.file --option=(var) -- (arg1) word (arg2) word [arg3] {...}

# possibly also make list results from parsing available for 'for'-loop usage?:
@FLAGS   # list of words (flags)
@OPTIONS # dict of words (opt: 'value', also opt: '(val)', all expanded to long opts if spec avail.?)
@ARGS    # list of words, also '(arg)', '[arg]', '{...}'

shift    # removes elements from the beginning of ARGV and corresponding others above 

# 2.) "typed-CLI" For iterating over the passed, and already parsed arguments (per type).
# (e.g. for each dict arg found in ARGV take two word args)

WORDV   # list of untyped (string) words found in ARGV
NAMEDV  # dict of named args (as found within ARGV)
        # includes  --flags as flag: true (Bool),
        # and --opt="value" as opt: "value"
BLOCKV  # list of block type 

TYPEDV  # list of all other typed args (as found within ARGV)
        # also/or? have lists of each type?

# 3.) "typed-func interface" For accessing a group of predefined function call params
#    (i.e. ';'-separated), optionally iterating over additional splats
myproc (a, b ; x, y ; name='value') {...}
named vars from positional-args, named-args and named splats

So args are readily available within the called proc, to be consumed (operated on), adjusted and possibly passed on to other procs.

With this, procs can access, iterate over, and pass-on, typed args with the same ease as string args. (Actually, much better because NAMEDV [EDIT: or better a separate dict?] already contains the parsed CLI flags and options.)

Extra credit: CLI flags, and options map very well to typed named-args (NAMEDV [EDIT: separate OPTSV? ]):


How it evolved:

myproc word (x, y; n=3)                         # one word and three typed arguments
myproc word (x) (y) (;) (n=3)                   # synonym
myproc word (x) [count > threshold] (;) (n=3)   # let one be an expression arg, could also be multiple 

In next comments: Get rid of the unnecessary semicolon arg (;) etc.


PS: I was originally wondering about what you meant when you wrote about having "Lazy Arg Lists", and found in the doc it's actually (much better) named (lazy-)expr-arg (https://www.oilshell.org/release/0.20.0/doc/ref/chap-cmd-lang.html#YSH-Simple)

After reading it, though, I felt there may even be much more potential to simplify ysh-simple, and to improve on the ability to compose.

bar-g commented 9 months ago

Maybe the stand-alone semicolon (;) could even be made unnecessary, by parsing/filtering out all named args first no matter where they are positioned?

The remaining args are then words, typed, expressions or blocks, all in their own order. That could even allow for a more common command argument ordering and flexibility, for example: myproc [count > threshold] word1 (->y) word2 (->z)

bar-g commented 9 months ago

Spreads could be passed like this:

myproc word (positional_list;)               # as positional spreads are also
                                             # defined before the semicolon
myproc word (;named_dict)`                   # named spread, as defined (last) after semicolon
myproc word (;named_dict) (positional_list;) # also both in arbitrary order

So all arg types can be used together, ordered as it is most practical, with named args addable anywhere:

myproc (debug=true) word (positional_list;) (->out) (;config_dict_from_hay) 
bar-g commented 8 months ago

Hm, also just like with the named-args (...=...), multiple block-literals {...} could also be identified and filtered out in a first step.

Which could allow to pass multiple block-literals {...} on the command line, allow for keyword/command binding to blocks either from the left or right side, and also to make use of -- to distingush options for sure.

And finally, some small further additions to this could even turn the default option parsing into a trivial hit of a jackpot:

EDIT:

andychu commented 8 months ago

I don't see what the motivation for this is -- why does it compose better than the existing syntax?

Also (;) is a bit ugly

bar-g commented 8 months ago

What? Maybe think a bit about myproc "$@".

The (;) was dealt with in the first comments topic.

bar-g commented 8 months ago

Just noticed the general case for myproc --name1=(expr) would need to be myproc --name1="( expr )" (double quoted), to allow spaces of the expression in the word.

bar-g commented 8 months ago

For example, arguments can't be passed on. And it seems impossible with the single typed vector, it does not support and extend the command line format standard. (But word based typed arguments (as of this issue) could be passed on and even directly parsed like flags and options mapping to typed named-options in ysh.)

ARGV compatibility and composability:

p() { echo p: \$@="$@"; q "$@"; }
proc q { echo q: \@ARGV= @ARGV; = ARGV; test -z ${has+run} || { r @ARGV; return 0 }; setglobal has = 'run'; p @ARGV }
proc r { echo r: \@ARGV= @ARGV; = ARGV }

echo
echo q 1 2 3
q 1 2 3

unset has

echo
echo "q ('1', '2', '3')"
q ('1', '2', '3')
echo
q 1 2 3
q: @ARGV= 1 2 3
(List 0x7efe1d446208)   ["1","2","3"]
p: $@=1 2 3
q: @ARGV= 1 2 3
(List 0x7efe1d446d88)   ["1","2","3"]
r: @ARGV= 1 2 3
(List 0x7efe1d457508)   ["1","2","3"]

q ('1', '2', '3')
q: @ARGV=
(List 0x7efe1d4c3ac8)   []
p: $@=
q: @ARGV=
(List 0x7efe1d457848)   []
r: @ARGV=
(List 0x7efe1d457848)   []

ysh ysh-0.20.0$

For the second run, it could be something like:

q ('1', '2', '3')                              # also q ('1') ('2') ('3')
q: @ARGV=('1') ('2') ('3')                     # or collapsed to '1' '2' '3'     for strings?
(List 0x7efe1d4c3ac8)   [('1'), ('2'), ('3')]  # or collapsed to ['1', '2', '3'] for strings?
p: $@=('1') ('2') ('3')                        # or collapsed to 1 2 3           for strings?
q: @ARGV=('1') ('2') ('3')                     # or collapsed to '1' '2' '3'     for strings?
(List 0x7efe1d457848)   [('1'), ('2'), ('3')]  # or collapsed to ['1', '2', '3'] for strings?
r: @ARGV=('1') ('2') ('3')                     # or collapsed to '1' '2' '3'     for strings?
(List 0x7efe1d457848)   [('1'), ('2'), ('3')]  # or collapsed to ['1', '2', '3'] for strings?
bar-g commented 8 months ago

Updated the example, showing @ARGV vs. = ARGV behavior.

andychu commented 8 months ago

The intention is that you should able to splat like

p @words (...typed; ...named; block)

However I found a bug related to that while testing just now

I think the problem there is that I wanted cd (myblock) and not cd (; ; myblock) which looks kinda ugly

But then I introduced a rule that doesn't compose

So there is something to fix here


Note that "$@" is deprecated in YSH

The replacement is @ARGV (which might change slightly

However @strs means "stringify and splice" right now. It is part of the language of strings

The syntax myproc (...typed) is for typed args. It is the typed equivalent of @strs, and must appear within parens


in other words, @strs is part of the shell command/word language, while (...typed) is part of the Python-like expression language

bar-g commented 8 months ago

p @words (...typed; ...named; block)

But these only iterate over specific argument types.

Whereas in the command interface language "$@" iterates over all flags, options, and arguments, i.e. words, and may also be called argument vector.

I notice, that you didn't use @ARGV in the example above, and I think that is correct and important.

I'd say "$@" and @ARGV are on the shell/command language side of things.

And two important properties of the shell command language are:

It's nice if "programming-style" syntax can also be used in ysh. However, as ysh still is a shell on the top-level, I think it should also keep the typical command language properties and extend that to allow typed args. So, not make the shell prompt feel (too, much, 'like', working, 'with a', "programming $language").

A short example might be:

proc p ( input1, input2 ; ->out1, ->out2)
p $a $b ( ->lane1, ->lane2 ) # "programming-style" well, quite sub-optimal for the general case
p $a (->lane1) $b (->lane2)  # "shell-command-style" (synonym)
                             # * an `@ARGV` with arbitrary type-intermingled permutation is possible
                             # * while parsed args are mapped into corresponding per-type "positionals" (typed arg lists).
# * named-args may be added in arbitrary order and any position, e.g. (debug=true)
#  * all leading named-args up to '--' may even be defined using known flag and opt syntax

Possible wording?

# positional arg lists
@ARGV    # naming (strings) of all args, incl. the below (also -x --flags and --options already parsed into NAMEDV )
WORDV
TYPEDV  # individually passed plus splat (but not those going into the type-specific positionals)
NAMEDV  # individually passed plus splat
BLOCKV  # passed as typed plus literals, or separate those?

From the examples in this issue it seems it's possible to have a really good mapping between "$@" / @ARGV and the type specific proc/func positional arg lists. (And even to nicely collapse string-typed args back into shell strings, e.g. to call external commands.)

And I think this solution would also work naturally for

bar-g commented 8 months ago

The idea in short:

andychu commented 8 months ago

From the examples in this issue it seems it's possible to have a really good mapping between "$@" / @argv and the type specific proc/func positional arg lists. (And even to nicely collapse string-typed args back into shell strings, e.g. to call external commands.)

As mentioned on the PR, I understand why this is an appealing idea, and other people have tried it before

But I don't think it's a good idea in general. You always need a little bit of code to bridge the gap.

In particular, this style will lead to bad error messages. Users should be writing their own error messages for CLIs, not relying on YSH/Oils

Designing a CLI takes a little bit of effort -- it's not something you can do just by writing a signature in YSH

bar-g commented 8 months ago

I agree, about designing real CLIs needing refinements etc.

The idea in this direction so far was to only do the default flag and opt parsing for internal proc calls and when using the runproc style for quick scripting.

But now that you mention possible drawbacks, maybe it's a good idea to create a separate dict for the auto-parsed flags and opts, so they can never get in the way in the named-args dict.

Note, I think that auto-parsed dicts are only one half of flag/opts handling, the other half is the custom way of iterating over them, and that is also the code which I would expect would be able to generate the best custom error messages if flags/opts are missing/wrong or inconsistent.


I tested how ysh composes by implementing the code for blog https://www.oilshell.org/blog/2017/01/13.html and it appears as still only "against grain" in ysh.

Try it out from: https://gist.github.com/bar-g/e9e8e19f9368bf02a0f92cc5752be435


What do you think about @ARGV needing to contain the entire command line and allowing multiple typed args in separate parens?

bar-g commented 8 months ago

@PossiblyAShrub See, if capable flag/opt parsing code is included, I'd hope it could also be put to good use for simplified scripting as an out-of-the-box ysh feature.

andychu commented 8 months ago

Thanks for writing the Forth compose tests

But I think this is a fundamental interior-exterior problem, and it's not possible to solve automatically or in general

If you want exterior composition, then you use flags and args. And manually write any conversion to typed data

If you want interior typed data, then you can you can just pass it around to funcs AND procs, without parsing


I made a distinction between procs as exterior and funcs as interior here

https://www.oilshell.org/release/0.19.0/doc/proc-func.html#at-a-glance

Perhaps we should also elaborate that procs can also take typed args, but those typed args are interior

i.e. there is no "auto-parsing"

If you want to use the Forth-like style, then you use strings / words only. Because we can't change the kernel interface -- we can't change char** argv[] and sys.argv so forth.

Python and C will never accept typed args -- you always have to deserialize from strings.

andychu commented 8 months ago

I think that is the killer argument -- anything we do in YSH is not going to affect Python or C.

Oils is Exterior-First

It's not actually the number 1 goal to make doing everything in YSH as convenient as possible. (Though there is some of that, I just got some feedback on using Hay from YSH in the interior style on Zulip)

The shell still remains for polyglot composition. And other languages only have string argv.


To be honest, it might even make some sense to have three keywords, like this

func f (typed1, typed) {
}

exterior-proc myproc (string1, string2) {
}

interior-proc myproc (string1, string2; typed1, typed2) {
}

So that would emphasize that when you use typed args, you're limiting yourself to the interior. That may not be obvious to users.

So it could be

andychu commented 8 months ago

Naming idea - is proc vs. typed proc

proc p (str1, str2, ...rest) {
}

proc p (str1; typed1) {  # ILLEGAL because it's not a typed proc
}

typed proc p (str1; typed1) {  # now it's OK
}

This is basically for learning/teaching, so we can say:

Then we don't need any caveats

It's nice to have a separation of interior and exterior. Right now proc is a mix of both

andychu commented 8 months ago

I made a note about typed proc vs. proc here - https://oilshell.zulipchat.com/#narrow/stream/384942-language-design/topic/typed.20proc.20vs.2E.20proc.3F

Not sure if we will do that, but it is one simple way to make things more explainable, make the interior/exterior distinction clear

That is probably one of the most important concepts in the language, and in shell programming

andychu commented 8 months ago

I think this can also clarify our advice

Right now our advice is - https://www.oilshell.org/release/0.19.0/doc/proc-func.html#tip-start-simple

You can start with just a list of plain commands:

Then copy those into procs as the script gets bigger:

Then add funcs if you need pure computation:


I think our advice can be

  1. start with plain commands
  2. add plain procs (not typed procs)
  3. add funcs if you need computation

Typed procs are actually not really for users!!! You can do everything you want with JSON/J8 and plain procs.

JSON is Dicts and Lists that you copy -- you very rarely need mutable dicts in shell-style programming.


The reason we want typed procs is to implement the 16 use cases

However that is more of an Oils dev thing than an end user thing.

Once we have settled on the metaprogramming techniques that can implement those use cases, than users can also use them


The simplest thing is that we wanted cd /tmp { echo $PWD }, and users can now implement that too


Also

Hmm I think this is pretty good ...

bar-g commented 8 months ago

Ok, now I found your typed proc proposal.

I'll need some time to read up deeper.

But just from glancing I think I get interior vs. exterior, but not why extra syntax here, don't see a problem it solves because there are clear errors when tying to call external commands with typed args. And don't see why should the exterior limit or only allow worse interior composing.


Piping and JSON passing is an important, but just one, type of shell-style programming. Good uses for forth-like composing I think are for example things like the the repeat, timeout, or debug function (latter is an internally broken shell function in the tests), and uses in modules for trivial passing of (internal) commands on to sub-modules.


Let's see if I can find sense to reduce the case to these internal composing of things rather than parsing.

Melkor333 commented 8 months ago

Let me get this a bit straight. It feels like various topics all at once. What we're talking about:

andychu commented 8 months ago

Yeah there are a bunch of topics in here, it might be better to start a new thread

The only thing I'm proposing is that there be 2 different "worlds"

There is no automatic conversion or serialization. CLIs take some effort to design; it can't be done automatically by YSH

You have to write help, and write good error messages yourself


So then the plain procs still compose in the Forth-like style.

The typed procs are more like Python, with mytyped @strs (...typed; ...named)

andychu commented 8 months ago

While I didn't understand all of this proposal, it feels like Perl-ish "magic". It doesn't feel like YSH

There are going to be a lot of corner cases; it would be a huge rewrite; and I don't like the names :-/

typed proc is a very tiny tweak to what we have

Though note there is still a bug with blocks