pkgjs / parseargs

Polyfill of `util.parseArgs()`
Apache License 2.0
122 stars 10 forks source link

Make "type: string" options greedy? #77

Closed shadowspawn closed 2 years ago

shadowspawn commented 2 years ago

Short version

Making an option with an expected option-argument (i.e. type: string) always consume the next argument has some benefits:

Long version

This is a revisit for a deeper dive into specifically whether a string option should be greedy. Previously raised in #25 and https://github.com/pkgjs/parseargs/pull/75#discussion_r821289922 but not considered in depth.

Given the program:

const { parseArgs } = require('@pkgjs/parseargs');
const options = {
    with: { short: 'w', type: 'string'}
};
console.log(parseArgs({ options }));

If there is an argument following the string option, does the parsing result depend on the contents of the argument? (Consider following argument one of: foo, -x, --xx, -5, true, -, --, ---, or empty string.)

Some simple examples:

node index.js --with foo
node index.js --with --floating-point-conversion
node index.js --w bar
node index.js --w -5

Let's take a look at the reference specifications and implementations I identified in #76.

POSIX

Utility Conventions and notes

An option with a mandatory option-argument (our "string" option) should be described in the usage like: [-c option_argument]

Guideline 7 states: Option-arguments should not be optional. This is clarified in the notes to make it clear that the option is greedy:

Guideline 7 allows any string to be an option-argument; an option-argument can begin with any character, can be - or --, and can be an empty string. For example, the commands pr -h -, pr -h --, pr -h -d, pr -h +2, and pr -h " contain the option-arguments -, --, -d, +2, and an empty string, respectively. Conversely, the command pr -h -- -d treats -d as an option, not as an argument, because the -- is an option-argument here, not a delimiter.

GNU

Program Argument Conventions adds long options and does not change the handling of mandatory option-arguments.

getopt_long

Greedy. An option with a mandatory option-argument consumes the next argument.

Commander

Greedy. An option with a mandatory option-argument consumes the next argument.

Minimist

The option-argument is optional. It is not consumed if it is option-like. Negative numbers are not consumed.

(The details actually vary between short and long options, but this may be a historical accident rather than by design! Different behaviours for -, leading ---, and empty string.)

Yargs

The option-argument is optional. It is not consumed if it is option-like. There is an exception added for negative numbers, which are consumed.

(The details actually vary between short and long options, but this may be a historical accident rather than by design! Different behaviours for - and leading ---.)

Minimist and Yargs likely rationale

Minimist and Yargs using zero-config parsing assume options may take an argument. This allows multiple options which are variously parsed as boolean or string options. For example with Minimist:

$ node index.js -b -w foo
{ _: [], b: true, w: 'foo' }

parseArgs does not have the same constraints as we are assuming auto-discovered options are boolean.

ljharb commented 2 years ago

"It is not consumed if it is option-like." seems like the proper approach, because the user's intention (which is more important than the parseArgs consumer's) is clearly to pass it as an option.

shadowspawn commented 2 years ago

The two use cases I am most familiar with where the user passes an option-argument starting with a hyphen are:

1) negative numbers, coordinates, et al

show-map --coordinates -123.45,872.56

2) options destined for another application

cc -Xlinker --some-linker-option foo.c
ljharb commented 2 years ago

If theyโ€™re destined for another application, they need to pass -- to stop argument parsing - thatโ€™s the way itโ€™s done.

shadowspawn commented 2 years ago

My description of "options destined for another application" does overlap with npm run foo... and dock run bar..., but this is for a situation that is not covered by --.

The example I gave was from clang:

 -Xanalyzer <arg>
         Pass arg to the static analyzer.

 -Xassembler <arg>
        Pass arg to the assembler.

  -Xlinker <arg>
         Pass arg to the linker.

  -Xpreprocessor <arg>
         Pass arg to the preprocessor.

gcc has the first three. The gcc -Xlinker description goes into some detail including a parsing issue:

If you want to pass an option that takes a separate argument, you must use -Xlinker twice, once for the option and once for the argument. For example, to pass -assert definitions, you must write -Xlinker -assert -Xlinker definitions. It does not work to write -Xlinker "-assert definitions", because this passes the entire string as a single argument, which is not what the linker expects.

A side note: single dash for long options! I assume this is because gcc started using long-options well before -- became the standard. As well as getopt_long, there is getopt_long_only for backwards compatible support which recognises long options following both - and --. The other convention that was tried early on was +long-option.

ljharb commented 2 years ago

clang seems to be super weird here, i'm not sure we should look to it for inspiration.

In the case where any program wants to have an option withValue, for the purposes of passing that along, i would expect it to need to be quoted to avoid it being parsed as an argument. namely, -x '--y' or --x='--y'.

shadowspawn commented 2 years ago

clang was an example of option-taking-option, not put forward as an inspiration. ๐Ÿ˜„

Quoting is not the answer. Quoting is stripped by the shell before the application sees the arguments.

 % node -p 'process.argv' -- -x '--y' 
[ '/usr/local/bin/node', '-x', '--y' ]
% node -p 'process.argv' -- --x='--y' 
[ '/usr/local/bin/node', '--x=--y' ]
ljharb commented 2 years ago

double-quoting? :-p node -p 'process.argv' -- -x '"--y"'

shadowspawn commented 2 years ago

Not sure how joking that is, but to be clear for the gentle reader, double-quoting is not a general solution either. The quotes that end up in the argument would need special treatment.

% node -p 'process.argv' -- -x '"--y"'
[ '/usr/local/bin/node', '-x', '"--y"' ]

The two forms that avoid optional-argument behaviour are:

ljharb commented 2 years ago

In that case I'd say let's just not support this use case.

If you want to pass arguments to something else, you use --. If you want to pass arguments to multiple things, you might just have to process process.argv yourself first.

aaronccasanova commented 2 years ago

Thanks for consolidating all that research @shadowspawn! Definitely helped me build a more informed opinion on the matter.

This is a tough one.. I like that type: 'string' being greedy immediately makes the parser behavior more consistent/predictable (e.g. type: 'string' in long and short forms always captures following values), would likely clean up the implementation, and enables support for signed numbers (which has already been brought up in two separate issues).

The tough part for me is while this change would align with notable standards (e.g. POSIX) it differs from other popular arg parsing Node libraries (e.g. Yargs and Minimist) with the exception of Commander. While my instinct says we should stick to established standards, a part of me thinks it might be better to use idiomatic conventions already used throughout the Node ecosystem.

That being said, I don't have a strong opinion either way and would be happy with either approach ๐Ÿ‘

shadowspawn commented 2 years ago

For my own interest, I went through the ~800 Commander bugs for ones related to greedy/not-greedy. Commander supports required greedy option-argument and optional choosy option-argument, so gets issues both ways around. I don't know what the historical usage ratio is, but expect more required option-argument than optional option-argument.

Issues related to greedy behaviour: 3 Issues related to option not being greedy: 4

The not-greedy issues had examples calls of:

shadowspawn commented 2 years ago

Making the option be "choosy" rather than "greedy" allows detection of some usage errors, where the option-argument is incorrectly omitted and the next argument starts with a hyphen. It comes with a cost in terms of more complicated behaviour and rejecting intended option-arguments.

I had a musing today whether only triggering the "choosy" behaviour for known options would give most of the benefits with few of the downsides. For example negative numbers would always be accepted as an option-argument unless there was a configured option for the starting digit.

The error in strict: true would be for an ambiguous usage rather than for a missing option argument.

(I haven't see other implementations do this, and have not worked through all the scenarios, but wanted to write it down! Not sure yet what would happen for strict: false.)

bakkot commented 2 years ago

If it's not greedy, what's the alternative for when you want to pass a string starting with -?

Can you do it with =? For example, does (e.g.) --coordinates="-1" give you { values: { coordinates: '-1' } }? If that works, and if ultimately we decide that --coordinates -1 is an error [in strict: true], it's almost certainly worth adding a special error for this case (value-taking option where the next thing in args starts with -) which suggests the user write --coordinates="-1" instead.

The non-greedy behavior seems a lot more palatable if it's always possible to have an error message which tells the user what to write instead.


For example negative numbers would always be accepted as an option-argument unless there was a configured option for the starting digit.

For negative numbers in particular, that's pretty close to what Python does, except that Python says "any numeric options" rather than "specifically that digit".

I think "any numeric options" is the more reasonable of the two, since it's weird as a user if --coordinates -1 works but --coordinates -2 throws an error.

shadowspawn commented 2 years ago

If it's not greedy, what's the alternative for when you want to pass a string starting with -?

Can you do it with =? ...

Yes.

The non-greedy behavior seems a lot more palatable if it's always possible to have an error message which tells the user what to write instead.

Good suggestion, with caveat it only assists in strict:true mode. Extra work, but extra value.

shadowspawn commented 2 years ago

Modified proposal: the type:string option processing is greedy. Strict mode throws an error if the candidate option-argument starts with a dash, due to the potential ambiguity of user intent with (e.g.) --foo --bar.

The error in strict mode for --foo --bar is custom (as suggested in https://github.com/pkgjs/parseargs/issues/77#issuecomment-1073066402), and a different message to foo missing a value


There is not a "right" answer, and split feedback. This enhanced approach might suit a majority.

aaronccasanova commented 2 years ago

Strict mode throws an error if the candidate option-argument starts with a dash, due to the potential ambiguity of user intent with (e.g.) --foo --bar.

I'm all for the modified proposal and gave my ๐Ÿ‘ for whatever that's worth ๐Ÿ˜„ However, I want to put it out there that I don't think erroring on option-like-arguments reduces ambiguity of user intent. It feels very situational to the tool being created.

Take for instance this grep example. I'm trying to search for instances of a given CSS custom property (which by convention starts with --). At first grep errors unrecognized option '--brand-color'. However, when I add the -e option it greedily accepts the next token. IMO this satisfies both the author and user intent.

image

Apologies for prolonging a resolution ๐Ÿ˜ฌ Feel free to disregard this opinion as I think there will be more opportunities to refine greedy behavior post MVP (e.g. global parserConfig.allowHyphenValues, optionConfig.allowHyphenValue, etc.).

shadowspawn commented 2 years ago

I want to put it out there that I don't think erroring on option-like-arguments reduces ambiguity of user intent

My description was not great.

The "ambiguity" is: did the user intend that, or did they make a mistake? Following your example context:

grep -e --line-buffered styles.scss

Did the user forget the pattern argument after -e, or did they really want to search for --line-buffered?

We aren't sure so in strict mode we can throw an error just in case, and suggest how to change command if was intended. e.g.

$ grep -e --line-buffered styles.scss
Did you forget to specify the option argument for '-e'? 
To specify an option argument starting with a dash use '-e--line-buffered' (no space after option).

Apologies for prolonging a resolution

Feedback is good. I wasn't sure how this hybrid would be received, and checking it seems reasonable before pushing it. ๐Ÿ˜„

bakkot commented 2 years ago

Did you forget to specify the option argument for '-e'?

Tiny bit of feedback on the error message: if it's possible to suggest they use a long-form argument with = instead (so, --regexp=--line-buffered), I think that's probably better than suggesting the -eVALUE form, even if it requires expanding a short alias to its long version. This is because I expect more people to be familiar with the --long=value form than with the -evalue form, and so less likely to be confused by the message.

(But that's just my intuition; no hard data either way. And the custom message is much better than nothing!)

shadowspawn commented 2 years ago

In somewhat of an aside, the grep example starts with an ambiguity/conflict between whether an argument is an option or a positional. This is a nice example of the -- support allowing the user to resolve, without the author needing to do anything extra.

$ grep --brand-color styles.scss
grep: unrecognised option ` --brand-color'
...
$ grep -- --brand-color styles.scss
aaronccasanova commented 2 years ago

Thanks @shadowspawn The examples you provided help clear up the ambiguity we're trying to avoid ๐Ÿ‘

shadowspawn commented 2 years ago

@bcoe Did you have an opinion on this? Ok to ignore if you are feeling bandwidth limited!

Summary of feedback and modified proposal in: https://github.com/pkgjs/parseargs/issues/77#issuecomment-1086534461