larryhastings / appeal

Command-line parsing library for Python 3.
Other
128 stars 7 forks source link

Consider changing when optional options become available, and how they're presented, in usage #3

Closed larryhastings closed 1 year ago

larryhastings commented 2 years ago

Something I've been pondering. Consider this Appeal API:

def optional_stuff(a, b, c, *, verbose=False, ignore_case=False): ...

@app.command()
def command(arg, stuff: optional_stuff=None): ...

Currently that would be rendered in usage as:

command arg [a [-v|--verbose] [-i|--ignore-case] b c ]

That is, a, b, c, -v, and -i are all in one "optional group". The options -v and -i only become available--only become "mapped"--once the user specifies the second positional argument, a. And yes, this really is the command-line API Appeal would create; see the Recursive Converters section of the Appeal docs to understand why.

This is consistent and understandable, but... it's also a little weird. I guess this is kind of a new command-line metaphor, having options that only become available after a certain number of positional parameters. But having them only become available after the first argument in the optional group? It's weird, right? It's not just me?

So, if we don't want that, what do we want? If we could start over and do anything, what's the usage string and command-line behavior of our dreams? I've convinced myself it's this:

command arg [[-v|--verbose] [-i|--ignore-case] a b c ]

That is, the optional part looks like conventional command-line usage, but with square brackets around it: options are shown first, then positional arguments.

But it's only fair to show this to the user if it actually works this way. If we show the user this usage string, they would quite reasonably expect this command to work:

% myscript.py command blah -v -i x y z

Can be made to work? Certainly. It means mapping the optional options at the judicious time (after arg is consumed), but not instantiating the call to optional_stuff until any of the options or arguments is specified. A little tricky but not impossible. Should it be made to work? It seems fine. If they specify -v or -i, they have to specify the three optional arguments a b and c. The hardest part seems like it'll be crafting an error message that gets this idea across to the user in an understandable way.

But this gives rise to a painful boundary condition when combined with *args:

def o2(a, *, verbose=False): ...

@app.command()
def command2(arg, *things: o2): ...

Currently this would be presented in usage as:

command2 arg [a [-v|--verbose]]...

That is, you can specify additional as as many times as you like, and each one can be followed by a -v. Completely unambiguous.

If we early-map optional options in this case, then what happens if the user runs this?

% myscript.py command2 meh first -v

Is this -v paired with the first instance of o2 (the one that gets called with a=first), or is it a preemptive option passed in to a second instance of o2 that the user never completes? It's kind of ambiguous.

In practice, it wouldn't be ambiguous--it'd consistently be one or the other. Either -v would be (re-)mapped before a was consumed, every time, or it would be (re-)mapped after a was consumed, every time. And since we're permitting -v to be used before a, then -v would have to be mapped before a was consumed, which means in the above command-line the -v would be passed in to the second call to o2(), which is incomplete because the user doesn't provide a second a. So this command-line is invalid--which I think the user would find surprising.

So I propose: we early-map optional options when they don't repeat, but we skip the early-mapping when they do repeat. I don't think that's amazingly wonderful, exactly,; it's a little inconsistent. But overall I think it minimizes unpleasantness and surprises to the user, and it's unambiguous.

There's one more thing to consider. Maybe it would be tidier if, for *args optional options, we display them in usage last. Consider:

def o3(p, q, r, *, verbose=False, ignore_case=False): ...

@app.command()
def command3(arg, *detritus: o3): ...

Which usage string is nicer?

1. command3 arg [p [-v|--verbose] [-i|--ignore-case] q r]...

2. command3 arg [p q r [-v|--verbose] [-i|--ignore-case]]...

I think 2. is prettier.

Note that I don't actually propose delaying mapping those optional options until the end. Between you and me, they'll still be mapped after the first optional argument (in this case p). It's just the usage string that we're tweaking.

larryhastings commented 2 years ago

I have been marinating on this for a while. And now I think I might be changing my mind.

Back in the day, options were required to come before (positional) arguments on command-lines. You couldn't say fgrep foo -i, you had to say fgrep -i foo. Allowing the former is a modern relaxation of the old rules.

So maybe "painful" isn't the word to describe the boundary condition described above. It's just a fact of life.

If my example above:

def o2(a, *, verbose=False): ...

@app.command()
def command2(arg, *things: o2): ...

had usage like this:

command2 arg [[-v|--verbose] a]...

where you were required to put option -v before the positional argument a, that doesn't seem all that bad.

To be precise: Appeal would require the options to be specified on the command-line before the last positional argument mapping to the args parameter. If your code looked like this:

def o3(a, b, c, *, verbose=False): ...

@app.command()
def command3(arg, *things: o3): ...

then usage would look like this:

command3 arg [[-v|--verbose] a b c]...

but in actuality you could specify the -v before a, after a but before b, or after b but before c. Just as long as it was before c. If you specified -v after c, it would map to the next invocation of o3().

So in this command-line:

% myscript.py command3 arg a b -v c

we'd call o3() once with verbose=True.

As a reminder, in this new world order, options in an optional group get optimistically lazy-mapped to the command-line when we complete all the positional arguments from the previous group.

--

I worry I'm not explaining this well, so I'm going to walk you--the reader--through it all the way.

When Appeal parses the command-line, it's building function calls. At at the point that Appeal runs out of input, if the function call is valid, Appeal is happy. If the function call is invalid, Appeal is unhappy and throws an exception.

Let's say Appeal is in the middle of parsing this command-line:

% myscript command3 myarg ...

myscript was the program the user ran, from their shell; the rest are arguments to that program, which Appeal is parsing. So far Appeal has seen two things, command3 and myarg. (The ellipses there indicate "the rest of the input, whatever it is".) From this, Appeal has started building a call to command3(), and has filled its arg parameter with myarg. At this exact moment, the function is callable--if there was no more input, Appeal would be happy and could successfully call command3().

Internally, Appeal has finished command3's first argument group, which is a mandatory group. All it contained was one positional argument, arg. The next argument group is optional, and it maps one option (-v) and three positional arguments. So if there was another positional argument on the command-line, Appeal would say "okay, we're building a call to o3() now`, and it would require a total of three positional arguments.

At this exact moment--where we haven't noticed whether or not there's another positional argument on the command-line--Appeal will now provisionally map -v to the command-line, by which I mean it will add -v to its internal list of options that are valid right now. But that wouldn't necessitate calling o3(). If there were no more arguments, Appeal will say "oh I guess we're not calling o3() after all, that's fine", and it'd be happy and call command3() as per the above.

Anyway. After Appeal provisionally maps -v to the command-line, the next thing it will do is pull out the next argument from the command-line argument iterator. Which, if we want to be all-encompassing here, could be one of four things:

  1. The command-line could already be exhausted, and there are no more arguments of any description. In which case, Appeal is happy, and it can call command3(), and it doesn't mind that it's not ever going to call o3(). (The things argument to command3() will be an empty iterable.)
  2. The next command-line argument doesn't start with a dash, in which case it's a positional argument. Appeal will say "okay, I have to build a call to o3()". This argument will be the first argument to o3(), and Appeal is gonna need two more if we're going to have a valid command-line. (The result of the call to o3() will be passed in to command3() as an entry in the things iterable.)
  3. The next command-line argument could be -v, in which case Appeal will also say "okay, I have to build a call to o3()". Appeal is gonna need three more arguments to fill in for this call to o3(), for this to be a valid command-line.
  4. The next command-line argument starts with a dash, but isn't -v or --verbose, in which case the command-line is illegal. (I told you we were covering all bases.)

So finally, here's the funny side-effect of these semantics. In this world, the following command-line:

% myscript.py command3 arg a b c -v

is invalid, because Appeal wants to call o3() twice. By the time Appeal sees the -v, it's already built the first call to o3()--that one's all done and ready to go. The -v it sees at the end there tells Appeal to build a second call to o3(), and we don't have any more positional arguments to use to fill it in, and so Appeal is unhappy and throws an exception. And now we need to figure out how to intelligibly communicate that to the user in a short string we can display before usage.

--

There are really three possibilities for how to handle options in repeating optional groups (*args):

  1. They're provisionally bound before the first positional parameter in the repeating optional group, and get rebound after we see the last provisional parameter. (This is the scenario I described in this post in this issue.)
  2. They're always bound after the first positional parameter in the repeating optional group, and only get rebound after we see the first provisional parameter in the next instance of that repeating optional group. (This is the scenario I described in my previous post in this issue.)
  3. Like 2., except that the first time we bind them provisionally before the first positional parameter as per 1.

I can see the benefits of each of these. I don't expect to do 3, because I think consistency is probably better than the mild convenience afforded by its inconsistency. And, as I said at the top of this message, I'm now tilting away from 2 and towards 1.

larryhastings commented 1 year ago

Done!

larryhastings commented 1 year ago

And for the record, I went with the early-binding (and therefore early-un-binding). I think the mild surprise folks might experience on the rare occasion an option is un-bound (or re-bound) earlier than they expected is minuscule, compared to the big win of options being available early the way folks would normally expect.