Optional function params

lmorg / murex

A smarter shell and scripting environment with advanced features designed for usability, safety and productivity (eg smarter DevOps tooling)

https://murex.rocks

GNU General Public License v2.0

1.51k stars 27 forks source link

Optional function params #888

Closed atagen closed 1 week ago

atagen commented 1 month ago

Describe the problem: I have several functions I would like to use with or without a parameter - for example, this ls alias ought to work with or without a path, and without prompting for a value.

function ls_culr (l_path: str []) {
  if { $l_path } {
     eza -lh --group-directories-first --icons --color=always $l_path -> culr -t 80 -o roygbiv-split
  } else {
     eza -lh --group-directories-first --icons --color=always -> culr -t 80 -o roygbiv-split
  }
}

Possible ways to implement: either a config setting to disable prompting for empty variables, or the ability to specify optional function params, ie. l_path: opt str.

Additional context:

Documentation: Please rate your success with referring to the docs @ https://murex.rocks

[ ] I haven't read the docs / the solution should be more discoverable
[ ] I have read the docs but the content wasn't clear
[x] I have read the docs but this query was missing
[ ] I have read the docs and this query was answered but I'm still having problems

lmorg commented 1 month ago

I've been thinking about this and I'm not 100% in favour of this (though, I'm also not 100% against this either).

My thoughts:

Flags

I generally view optional parameters as being better served with flags. eg

ls_culr --path /path

Counterpoint:

This admittedly isn't all that ergonomic for frequently used functions
Murex violates this principle itself with a number of its own builtins

Anonymous Parameters

Optional parameters are still supported with anonymous parameters

function ls_culr {
  if { $1 } {
     eza -lh --group-directories-first --icons --color=always $1 -> culr -t 80 -o roygbiv-split
  } else {
     eza -lh --group-directories-first --icons --color=always -> culr -t 80 -o roygbiv-split
  }
}

Counterpoint:

This is exactly the ugly "Bashism" I was trying to avoid by creating named parameters

Simplicity vs Flexibility

The idea behind the parameters is to be a simple interface. I think the risk of adding opt is that it could lead to a slippery slope of other requirements like flags. So perhaps we are better off defining another interface? For example something like:

function ls_culr %{
  Flags: []
  Parameters: [
    Name: l_path
    DataType: str
    Optional: true
  ]
} {
  if { $l_path } {
     eza -lh --group-directories-first --icons --color=always $l_path -> culr -t 80 -o roygbiv-split
  } else {
     eza -lh --group-directories-first --icons --color=always -> culr -t 80 -o roygbiv-split
  }
}

This way a variety of different parameter formats could be supported.

Counterpoint:

It's verbose (though does that matter, since this format would mainly be used in scripts anyway?)
You know have 3 different ways to define parameters: anonymous parameters ($1...), named parameters (eg your code), and this new format

Its hard to read. Perhaps a better approach would be something like:


ls_culr_args = %{
Flags: []
Parameters: [
Name: l_path
DataType: str
Optional: true
]
}

function ls_culr $ls_culr_args { if { $l_path } { eza -lh --group-directories-first --icons --color=always $l_path -> culr -t 80 -o roygbiv-split } else { eza -lh --group-directories-first --icons --color=always -> culr -t 80 -o roygbiv-split } }



---

Maybe there's a better option out there I've not considered?

Or maybe the best option is to add a `opt` field like you've suggested?

atagen commented 1 month ago

Flags

This admittedly isn't all that ergonomic for frequently used functions

Yes, I wouldn't really want to use flags outside of an executable, especially for short everyday shell aliases. If you wanted a named calling convention for optional params, I think OCaml's style of ~parname: val wouldn't be too bad, and doesn't seem to collide with other syntax. To give some context, though, I alias this command to the single letter l - I'm trying to move my fingers as little as possible to get the most information I can.

Anonymous Parameters

Optional parameters are still supported with anonymous parameters

Works for now, but as you say, it is kind of "smelly" and against Murex's general taste for explicitness.

Simplicity vs Flexibility

Terseness and ease of expression are some of the best qualities of shells (at least, the ones I choose :)) and scripting them in a *nix userland. While it's true that this syntax may not see use outside of scripts, it would destroy readability of any decently sized block of code - it's arguably more verbose than most generalist languages. I think Murex fits best in the niche between "knock out a 5 line bash script and pray" <-> "rewrite our tooling in a full featured language". I'd like to see it play to the strengths of shell scripting as much as possible, while still avoiding both the typical stringly typed pitfalls and the tendency of next gen shells to overthink/overextend the syntax, increasing friction around adoption. At some level, it's gotta stay looking and working at least a bit like a combination of bash and an algol descendent to keep the "everyman" appeal..

Or maybe the best option is to add a opt field like you've suggested?

I think so. If you're still looking to preserve general strictness around calling functions, I'd have no problem with it being gated behind a config option, some top level enable: optional declaration in the script, etc.

An alternative I've considered here is to allow declaring functions of the same name but differing arity, so I could have both function ls_culr {..} and function ls_culr (l_path: str) {...}, then have Murex automatically execute the correct function based on how it's called. This way we avoid if { $thing } handling blocks and general uncertainty around the existence of variables, and each function's purpose is completely clear. The only downsides I could see to that are implementation side and potential security issues (but if something untrusted's defining functions in your shell you've probably got bigger problems anyway).

atagen commented 1 month ago

Okay, the more I think about it the more I prefer the idea of allowing functions with the same name but different arities - it fully and explicitly encodes the optionality and resultant behaviour within the existing type system without adding any layers, with less surface exposed for both internal and scripting level bugs, and probably a much more natural and minimal a change overall.

lmorg commented 1 month ago

Murex's runtime was written to avoid function overloading so this wouldn't be a trivial change.

I'm also not sure how well this would work either. The reason being is that technically every function signature is the same: fn([]string) due to the way how operating systems pass parameters to processes. While Murex does allow typed parameters, it's always just an array of strings that get passed and then converted to (for example) an int when the function is called.

I'm also a little worried this might make it harder to reason about code. We already have a namespace hierarchy with aliases, functions, privates and external processes. Adding function overloading might make it too difficult to debug. Particularly beginners to the shell.

Another option is to abuse the default value. You code already appears to do this actually:

function ls_culr (l_path: str []) {
    ...
}

The problem with this is that it's not explicit.

This does lead me onto another question:

What's the aim behind having an optional parameters?

By this I don't mean, "why does your function need optional parameters", because I do get the value of having optionals in a shell.

What I mean is, what behaviour are you expecting to change within Murex given shells make all parameters are optional by default.

I have a theory what you're trying to achieve but I want to make sure I'm trying to solve the same problem you are.

lmorg commented 1 month ago

The more I think about this, the more I'm sure that what we have here is two distinct but related problems:

nullable types
prompt suppression

Nullable Types

The advantage of anonymous parameters is that they're unset if that parameter isn't utilised. eg

function example {
    out $1 $2 $3
}

example will error if fewer than 3 parameters are passed:

» example 1
Error in `out` (2,7): variable '2' is not set because the scope returned the following error when querying parameter 2: too few parameters
                    > Expression: out $1 $2 $3
                    >           :         ^
                    > Character : 8
» example 1 2
Error in `out` (2,7): variable '3' is not set because the scope returned the following error when querying parameter 3: too few parameters
                    > Expression: out $1 $2 $3
                    >           :            ^
                    > Character : 11
» example 1 2 3
1 2 3

(It's not the nicest error message I've ever written, but it's at least clear where the error lies)

The problem here is if you want to use named parameters, then you cannot have them unset if they're unused.

There's two solutions here:

to literally make them unset if that parameter isn't used
to make then nullable. So you can use is_null to check if they're set.

I think option 2 here is my preferred approach because it also works if variables are unset. So there's API consistency. But it's also doesn't cause an error if it's read somewhere so it's potentially easier to reason what the code is doing (ie I know that optional variable is defined, but it's not given a value).

We can talk about the syntax of using nullable values in named parameters later...

Prompt Suppression

One of the aims for named parameters was to have a list of required parameters that you'd get prompted for. Basically making functions a little easier to use for people unfamiliar with them. However, as you've rightly pointed out, there's other ergonomics improved using named parameters.

So you might still want a named parameter to be mandatory (ie have a sane default) but harass the user for input. So we'd want to suppress the prompt.

In terms of your specific code, if you're ever likely to want to pass multiple parameters to eza (eg have globbing expanded), then you could use the same approach I've used for the Bash and Helix wrapper:

function ls_culr {
    config set proc strict-arrays false
    eza -lh --group-directories-first --icons --color=always @PARAMS -> culr -t 80 -o roygbiv-split
}

strict-arrays basically enables or disables errors being raised if an empty array is passed as a parameter.

Turning that off means eza can support any number of parameters (including none) passed from it's parent function.

This wouldn't be ideal if you always just want 1 and only 1 parameter passed. But in the case of Bash and Helix, those wrapper functions needed to behave almost transparently to the user.

atagen commented 1 month ago

Another option is to abuse the default value. You code already appears to do this actually

Sadly ineffective, as 1. it still asks for a value, and 2. [] coerces to "" so most applications end up looking for a file named nothing.

The problem with this is that it's not explicit.

Agreed.

What's the aim behind having an optional parameters?

An escape hatch from the strictness of requiring an input on all parameters, without otherwise breaking the explicitness of params I do potentially want required.

Nullable Types

..it also doesn't cause an error if it's read somewhere

This is the key - no error, and no output where the variable is coerced to string (ie. in commands).

However..

Prompt Suppression

..you might still want a named parameter to be mandatory (ie have a sane default) but harass the user for input

..does this mean you'd make everything nullable by default, and mark mandatory params? I would hope they are only allowed to remain unset if null is defined as their default value, or they have some keyword that makes it clear they can be left null.

The @PARAMS tip is appreciated, there's at least one tool I plan to port over that could use it.

lmorg commented 1 month ago

How about the following:

! prefix to denote prompt suppression
If a default is not set (ie no presence of [...], then named parameter will be set to null
If a default is set, even if it's a zero length string, such as [], then the named parameter will be set to that default if prompt is suppressed and parameter not passed

function example (!foo: str) {
    ...
}

atagen commented 1 month ago

Sounds perfect - compact and logical. And thanks for working through this so patiently :)

lmorg commented 1 month ago

I can extend the same thank you to yourself too 👍

There is one minor change. I've decided to have variables for optional parameters unset, rather than null, if there is no default.

The reason being:

All arguments in POSIX are just an array of bytes (Windows is even worse because it's just one giant parameter and up to each individual application how that giant parameter gets parsed into multiple parameters 😮‍💨)

Anyhow, this means I would have to write a plethora of edge case code to handle nulls otherwise they'll silently get converted into zero length string -- which defeats the entire purpose of this exercise. Whereas if a variable is unset, it will cause the function to fail in more predictable ways.

I've pushed a working version to develop branch so it's available for testing.

atagen commented 1 month ago

Understandable change - so I should stick with if { $var } to check for set-ness?

Does this make a !bool being false and unset equivalent? Not really a problem but ideally something to document.

Will check out the new branch soon.

atagen commented 1 month ago

I'm not sure if I've done something wrong here:

❯ murex --version
murex v6.4.0309 (develop)
GPL v2
2018-2024 Laurence Morgan
❯ function test (t: str) { echo $t }
❯ test ok
ok
❯ test
Please enter a value for 't':                                                                                                                                 

❯ function test (!t: str) { echo $t }
Error in `function` (0,1): cannot parse function parameter block: unexpected character '!' (chr 33) at 1 (1,1):
                         > code: (!t: str)

edit: Ok, nevermind, Kitty is still launching the old one. If I re-exec murex it works perfectly:

❯ function test (!t: str) { if { $t } { echo $t } else { echo nope! } }
❯ test
nope!
❯ test ok
ok

And the optional bool works as I thought, neat.

lmorg commented 1 month ago

Understandable change - so I should stick with if { $var } to check for set-ness? Does this make a !bool being false and unset equivalent? Not really a problem but ideally something to document.

There's a few ways to check:

if { $var } this checks if a value is "falsey". So false, zero-length string, unset, etc. Generally useful for shells because generally you want to exclude zero length strings and such like.
is-null is more applicable for when you specifically want to check if a value is or is not set. eg !if { is_null var } (note to self: I should probably add support for !is-null because a common use case will be to check if a value is set)

Good point about the documentation. I forgot to update that with this commit.