Version 21.1.0 breaks "--"-related configuration

Toberumono commented 2 years ago

As of version 21.1.0, configuring yargs-parser with {"populate--": true, "unknown-options-as-args": true, "halt-at-non-option": true} results in every non-flag being put in "--" regardless of whether it is part of a valid command. Prior to version 21.1.0, non-flags were excluded from "--" provided that they were part of a valid command. Example Command: node foo.js validCommand Expected behavior: executes the command "validCommand" Actual: places "validCommand" in the "--" array.

Effectively, it is treating every command as invalid.

Unfortunately, I cannot build directly because version 21.1.0's type bindings are broken, and I'm using TypeScript, so I can't dig into it properly at this time.

bcoe commented 2 years ago

@Toberumono this issue is most likely caused by:

https://github.com/yargs/yargs-parser/pull/438

Can you coordinate a fix with @kherock?

Toberumono commented 2 years ago

@bcoe From what I can tell, yargs-parser doesn't have any way of knowing what a command is (valid or otherwise) - it just handles "commands" as positional arguments. Therefore, we have two options:

Inform yargs-parser of what yargs considers to be a valid command
Add a flag that enables the new functionality that @kherock requested/provided

I would recommend option 2 because it is new functionality and option 1 would require some significant changes in yargs-parser and yargs because the tree of commands would have to be computed before parsing takes place. While option 1's requirements are certainly not insurmountable, they is more complicated than necessary.

As for flag names, I'm thinking, "treat-all-non-flags-as-invalid" or something similar.

kherock commented 2 years ago

@Toberumono I think that the changes fixing the bug in my PR is valid, I don't want to make the old behavior the default since it definitely wasn't intuitive. The crux of that issue was that unknown-options-as-args: true causes halt-at-non-option to have zero effect.

Just so I understand the issue correctly, the issue is that instead of { "_": ["node", "foo.js", "validCommand"] } in the parse output, you're seeing { "--": ["node", "foo.js", "validCommand"] }?

Toberumono commented 2 years ago

@kherock The reason why I was suggesting a flag is that your change breaks scripts that use yargs to process the command to execute and the command's arguments. What you set the flags to stops mattering after applying your change because everything without a '-' at the start is treated as permanently invalid.

For a more complete example, assume the following script:

const yargs = require('yargs');
yargs(process.argv.slice(2)) // Node's process.argv array includes both itself and the executed filename
    .parserConfiguration({"populate--": true, "unknown-options-as-args": true, "halt-at-non-option": true})
    .option("thing", {
        type: "boolean",
        "etc": "etc"
    })
    .command("valid", "A valid command", innerYargs => innerYargs
        .command("command", "A valid command", () => {}, args => console.log("Do something else here", args.thing)),
        args => console.log("Do something here", args.thing))
    .parse();

Before version 21.1.0, that script would properly handle:

node script.js valid => Do something here undefined
node script.js valid --thing => Do something here true
node script.js valid command => Do something else here undefined
node script.js valid command --thing => Do something else here true
node script.js abc => [nothing, because the command wasn't matched]
etc

Additionally, it would place anything that doesn't obey the valid command tree in the "--" section except for defined options, which would be properly parsed into the arguments object (creating a full example for that would take a while, so I'd rather not provide it unless it's necessary).

After version 21.1.0, that script treats all of the above examples as not having any commands.

kherock commented 2 years ago

I was actually able to reproduce a simpler version of this issue on v21.0.1, just enable

{"populate--": true, "halt-at-non-option": true}

The same issue will occur where args end up in -- instead of _. The real fix should be somewhere around here:

https://github.com/yargs/yargs-parser/blob/c474bc10c3aa0ae864b95e5722730114ef15f573/lib/yargs-parser.ts#L397-L402

For your use, you can see that halt-at-non-option is (perhaps unintuitively) declaring the rest of the args as notFlags instead of pushing them as a positional. My PR made this bug less obscure because previously, halt-at-non-option had no effect in your config.

bcoe commented 2 years ago

👋 not ignoring this thread, but will need to set aside an hour to fully grok these two competing expectations for functionality.

bcoe commented 2 years ago

@Toberumono your example:

node script.js valid => Do something here undefined
node script.js valid --thing => Do something here true
node script.js valid command => Do something else here undefined
node script.js valid command --thing => Do something else here true
node script.js abc => [nothing, because the command wasn't matched]

Would work with yargs' default behaviour without setting:

    .parserConfiguration({"populate--": true, "unknown-options-as-args": true, "halt-at-non-option": true})

Could you explain to me, with some illustrative examples of inputs, the specific behaviour you're trying to achieve using these three settings in tandem.

Context: I'm on the fence about rolling back @kherock's change. I'm inclined to not call it breaking, even though it obviously broke some people in the wild. Because the behaviour is not part of the documented contract of the API. I would rather figure out if there's a way to get the behavior you're looking for without such a complex parser configuration.

Toberumono commented 2 years ago

@bcoe The examples that I provided were minimum examples for what kherock's change broke and/or revealed to be broken, not necessarily the full functionality that I want (and thought was the case, but that turned out to be partially due to luck).

Regarding rolling back kherock's change, I think that the agreement that rolling it back isn't necessary is somewhere in the spiderweb of related PRs and issues, but I don't remember where. In summary, this has morphed into a, "the functionality is generally unexpected and should probably be updated a bit".

For specific functionality, I basically want "unknown-options-as-args" and "halt-at-non-option" to work with commands subcommands. As of right now, if you combine the two ("populate--" isn't strictly part of the problem), the parser treats all commands and subcommands as invalid because yargs-parser does not receive non-flag tokens from yargs.

For my expanded example, I'll be assuming the following: Valid commands (second layer commands are subcommands, so "valid command" is valid, but "command" is not):

valid
  command
  thing
foo
  bar
bar

Valid flags:

first
second
third

The examples assume that every command points to a magic function that prints out all valid tokens in one array followed by the leftover tokens after hitting the first invalid one.

With parserConfiguration({"unknown-options-as-args": true, "halt-at-non-option": true})

Expected: node script.js valid => [valid] []
Current:  node script.js valid => [] [valid]

Expected: node script.js valid --first => [valid, --first] []
Current:  node script.js valid --first => [] [valid, --first]

Expected: node script.js --first => [--first] []
Current:  node script.js --first => [--first] []

Expected: node script.js valid --what --first => [valid] [--what, --first]
Current:  node script.js valid --what --first => [valid, --what, --first] []

Expected: node script.js valid command => [valid, command] []
Current:  node script.js valid command => [] [valid, command]

Expected: node script.js command => [] [command]
Current:  node script.js command => [] [command]
^That one works purely by accident, but it is an important aspect of the functionality - only valid commands should be accepted.

Expected: node script.js valid bar => [valid] [bar]
Current:  node script.js valid bar => [] [valid, bar]

If the examples aren't sufficient, I can try to simplify the script I'm currently working with to a usable (and sharable - it's internal code, so I can't post it publicly) version, but that would take quite a while (it's a process automation script that's pushing half a megabyte of code across 40-ish files).

bcoe commented 2 years ago

@Toberumono on my mind, is given these examples, something close to @kherock's patch that populates "_" rather than "--" for unknown options and positionals, rather than "--".

If we could extend this fix to address all your examples, would we be on the right track?

kherock commented 2 years ago

The problem is definitely linked to populate-- since yargs temporarily forces it on in order to parse commands:

https://github.com/yargs/yargs/blob/f91d9b334ad9cfce79a89c08ff210c622b7c528f/lib/yargs-factory.ts#L1970-L1973

I also can't replicate the description of the expected behavior exactly - there's no config where --first is sent to the "leftover tokens" list. This is the script I'm using to test on version 21.0.1:

import yargs from 'yargs';

const parse = (argv) => yargs(argv)
    .parserConfiguration({ 'unknown-options-as-args': true, 'halt-at-non-option': true })
    .option('first', { type: 'boolean' })
    .option('second', { type: 'boolean' })
    .option('third', { type: 'boolean' })
    .command('$0', 'no command', () => { }, argv => console.log(argv))
    .command(
        'valid',
        'A command',
        yargs => yargs
            .command('command', 'A valid command', () => { }, argv => console.log('valid command', argv))
            .command('thing', 'A valid command', () => { }, argv => console.log('valid thing', argv)),
        argv => console.log('valid', argv),
    )
    .command(
        'foo',
        'A command',
        yargs => yargs
            .command('bar', 'A foo command', {}, argv => console.log('foo bar', argv)),
        argv => console.log('foo', argv),
    )
    .command(
        'bar',
        'A command',
        () => { },
        argv => console.log('bar', argv)
    )
    .parse();

parse(['valid']);                      // valid { _: [ 'valid' ], '$0': 'script.js' }
parse(['valid', '--first']);           // valid { _: [ 'valid' ], first: true, '$0': 'script.js' }
parse(['--first']);                    // { _: [], first: true, '$0': 'script.js' }
parse(['valid', '--what', '--first']); // valid { _: [ 'valid', '--what' ], first: true, '$0': 'script.js' }
parse(['valid', 'command']);           // valid command { _: [ 'valid', 'command' ], '$0': 'script.js' }
parse(['command']);                    // { _: [ 'command' ], '$0': 'script.js' }
parse(['valid', 'bar']);               // valid { _: [ 'valid', 'bar' ], '$0': 'script.js' }

Then when I change unknown-options-as-args to false, subcommands fail to parse (and this is identical to the v21.1 output):

// parserConfiguration({ 'unknown-options-as-args': false, 'halt-at-non-option': true })
{ _: [ 'valid' ], '$0': 'script.js' }
{ _: [ 'valid', '--first' ], '$0': 'script.js' }
{ _: [], first: true, '$0': 'script.js' }
{ _: [ 'valid', '--what', '--first' ], '$0': 'script.js' }
{ _: [ 'valid', 'command' ], '$0': 'script.js' }
{ _: [ 'command' ], '$0': 'script.js' }
{ _: [ 'valid', 'bar' ], '$0': 'script.js' }

I realize that in an ideal world, the logic governing unknown-options-as-args would be able to understand when a "command" is known, but this is simply not possible with yargs-parser currently.

Lastly, additionally turning off halt-at-non-option restores the original output aside from --what being parsed.

Hopefully we can agree that the configuration being discussed here still has a bug pre-v21.1 where halt-at-non-option isn't working properly—and when it is working properly, populate-- forced by yargs interferes with its operation. There should be a solution out there that should fixes both of these issues. I see a few options

When enabled, add special handling for halt-at-non-option in yargs itself. An improvement could be to have yargs look at the contents of '--' for a matching subcommand. If a command matches, then the rest of the args are parsed again and merged into the result.
- This has a small caveat where valid -- command --first is indistinguishable from valid command --first
When halting, send args to _ regardless of the populate-- setting. Then extend the yargs config with a tree of commands that serve as exceptions to the halt-at-non-option rule.
- for example, a object might replace the boolean in configuration, though the API feels clunky
```
{
'unknown-options-as-args': true,
'halt-at-non-option': {
exclude: {
  valid: ['command', 'thing'],
  foo: ['bar'],
  bar: [],
}
}
}
```
Implement an alternative to halt-at-non-option that works better with subcommands. This option would push args to _ while some criterion is met, sending the rest to -- when populate-- is true
- For example, a rule like { 'halt-at-unknown-command': ['valid', 'foo', 'bar'] } could define the first level of commands that should be recognized as known args. Consumers (such as yargs) would then send the leftover tokens in -- to the next subcommand, which may or may not specify a new set of subcommands in the parser config. yargs could integrate with this option directly by always passing it a list of registered command handlers when 'halt-at-unknown-command' is set to true.

Toberumono commented 2 years ago

@bcoe If we could extend the fix, it would be on the right track; however, as I stated on that same PR, I'm concerned that this is just shunting the confusing behavior of these parameters from one spot to another. I'm assuming that there is a reason that yargs cannot provide yargs-parser with a tree of legal commands? In the event that this is in part an issue of time vs number of people actually helped, I'd be happy to contribute the code for this feature (honestly, I'd be happy to contribute the code regardless - I just don't want to put a bunch of work in on a feature that doesn't have a chance of being integrated).

@kherock At this point, this "issue" should be considered a feature request for the logic governing token validity (unknown-options-as-args and halt-at-non-option in the preceding examples) to understand when something is a command.

For that feature, I think that option 3 would be the best. The caveat in option 1 is something that I'm currently dealing with in my code, and isn't necessarily rocket science to account for, but would be nice to avoid. The same concerns I had with changing the behavior of populate-- still apply with option 2.

My initial idea for how to implement it was to provide a valid command tree, but given the descriptions of how yargs works under the hood, I doubt that a prefix tree starting from root would be possible.

bcoe commented 2 years ago

@Toberumono

I'm assuming that there is a reason that yargs cannot provide yargs-parser with a tree of legal commands?

Commands are lazy loaded, so for nested commands (the way yargs is designed) there's no way to know the full set prior to calling parse.

Also, since order matters command-1 command-2 is different than command-2 commnad-1, just knowing the set of commands isn't sufficient to decide whether or not arguments are valid.

This aside, I like the separation of concerns that commands live in yargs, and are a layer of logic that lives above yargs-parser -- yargs-paser knows how to tokenize an array of arguments into tokens, but doesn't know about the advanced features yargs facilitates, e.g., help output, command handling.

Next steps

If you submit a few failing tests to yargs (rather than yargs-parser), which pass prior to 21.1.0, but fail in 21.1.0, I think this would be a good starting point to figure out a fix that supports your old behavior.

If you centre these tests around some real world user behavior, e.g., the types of things people would enter into your command line app, then it will help determine if there is additional functionality we should add to yargs or yargs-parser, e.g., an allow-list of positionals.

Toberumono commented 2 years ago

@bcoe RE commands being lazy loaded: I figured that that would be the case. And yes, if the only realistic way to get stuff in is via a set rather than a tree (not TreeMap, but a directed acyclic graph), then the issue isn't solvable without removing the separation of concerns.

RE separation of functionality: I agree on that front. The main reason that I suggested providing a prefix tree of tokens was that it avoids adding the full concept of commands to yargs-parser (to be clear, I'm not recommending going that route here - given commands are lazy-loaded, pre-computing a prefix tree isn't an option).

RE the more general stuff / next steps: At this point I've handled issues that I encountered through more rigorously defining commands that are allowed to have arbitrary arguments and extracting "--" through a bit of preprocessing (I was already futzing with the arguments list to get help displaying in a manner more semantically logical to my coworkers). I highly doubt that any of this is worth making drastic changes to yargs (or any related libraries) to support, and it sounds like the changes would need to be quite drastic indeed.

Regardless, I'm going to be having to switch away to some other stuff for the time being (technically I already have). If I stumble on a more widely relevant use case, I'll create some demonstrative tests.

bcoe commented 2 years ago

@Toberumono sorry for the pain in the neck -- unfortunately config options aren't tested together rigorously, regressions like this have bit people like this a few times.

yargs / yargs-parser

Version 21.1.0 breaks "--"-related configuration #453

Next steps