Closed Toberumono closed 2 years ago
@Toberumono this issue is most likely caused by:
https://github.com/yargs/yargs-parser/pull/438
Can you coordinate a fix with @kherock?
@bcoe From what I can tell, yargs-parser doesn't have any way of knowing what a command is (valid or otherwise) - it just handles "commands" as positional arguments. Therefore, we have two options:
I would recommend option 2 because it is new functionality and option 1 would require some significant changes in yargs-parser and yargs because the tree of commands would have to be computed before parsing takes place. While option 1's requirements are certainly not insurmountable, they is more complicated than necessary.
As for flag names, I'm thinking, "treat-all-non-flags-as-invalid" or something similar.
@Toberumono I think that the changes fixing the bug in my PR is valid, I don't want to make the old behavior the default since it definitely wasn't intuitive. The crux of that issue was that unknown-options-as-args: true
causes halt-at-non-option
to have zero effect.
Just so I understand the issue correctly, the issue is that instead of { "_": ["node", "foo.js", "validCommand"] }
in the parse output, you're seeing { "--": ["node", "foo.js", "validCommand"] }
?
@kherock The reason why I was suggesting a flag is that your change breaks scripts that use yargs to process the command to execute and the command's arguments. What you set the flags to stops mattering after applying your change because everything without a '-' at the start is treated as permanently invalid.
For a more complete example, assume the following script:
const yargs = require('yargs');
yargs(process.argv.slice(2)) // Node's process.argv array includes both itself and the executed filename
.parserConfiguration({"populate--": true, "unknown-options-as-args": true, "halt-at-non-option": true})
.option("thing", {
type: "boolean",
"etc": "etc"
})
.command("valid", "A valid command", innerYargs => innerYargs
.command("command", "A valid command", () => {}, args => console.log("Do something else here", args.thing)),
args => console.log("Do something here", args.thing))
.parse();
Before version 21.1.0, that script would properly handle:
node script.js valid
=> Do something here undefined
node script.js valid --thing
=> Do something here true
node script.js valid command
=> Do something else here undefined
node script.js valid command --thing
=> Do something else here true
node script.js abc
=> [nothing, because the command wasn't matched]Additionally, it would place anything that doesn't obey the valid command tree in the "--" section except for defined options, which would be properly parsed into the arguments object (creating a full example for that would take a while, so I'd rather not provide it unless it's necessary).
After version 21.1.0, that script treats all of the above examples as not having any commands.
I was actually able to reproduce a simpler version of this issue on v21.0.1, just enable
{"populate--": true, "halt-at-non-option": true}
The same issue will occur where args end up in --
instead of _
. The real fix should be somewhere around here:
For your use, you can see that halt-at-non-option
is (perhaps unintuitively) declaring the rest of the args as notFlags
instead of pushing them as a positional. My PR made this bug less obscure because previously, halt-at-non-option
had no effect in your config.
👋 not ignoring this thread, but will need to set aside an hour to fully grok these two competing expectations for functionality.
@Toberumono your example:
node script.js valid => Do something here undefined
node script.js valid --thing => Do something here true
node script.js valid command => Do something else here undefined
node script.js valid command --thing => Do something else here true
node script.js abc => [nothing, because the command wasn't matched]
Would work with yargs' default behaviour without setting:
.parserConfiguration({"populate--": true, "unknown-options-as-args": true, "halt-at-non-option": true})
Could you explain to me, with some illustrative examples of inputs, the specific behaviour you're trying to achieve using these three settings in tandem.
Context: I'm on the fence about rolling back @kherock's change. I'm inclined to not call it breaking, even though it obviously broke some people in the wild. Because the behaviour is not part of the documented contract of the API. I would rather figure out if there's a way to get the behavior you're looking for without such a complex parser configuration.
@bcoe The examples that I provided were minimum examples for what kherock's change broke and/or revealed to be broken, not necessarily the full functionality that I want (and thought was the case, but that turned out to be partially due to luck).
Regarding rolling back kherock's change, I think that the agreement that rolling it back isn't necessary is somewhere in the spiderweb of related PRs and issues, but I don't remember where. In summary, this has morphed into a, "the functionality is generally unexpected and should probably be updated a bit".
For specific functionality, I basically want "unknown-options-as-args" and "halt-at-non-option" to work with commands subcommands. As of right now, if you combine the two ("populate--" isn't strictly part of the problem), the parser treats all commands and subcommands as invalid because yargs-parser does not receive non-flag tokens from yargs.
For my expanded example, I'll be assuming the following: Valid commands (second layer commands are subcommands, so "valid command" is valid, but "command" is not):
valid
command
thing
foo
bar
bar
Valid flags:
first
second
third
The examples assume that every command points to a magic function that prints out all valid tokens in one array followed by the leftover tokens after hitting the first invalid one.
With parserConfiguration({"unknown-options-as-args": true, "halt-at-non-option": true})
Expected: node script.js valid => [valid] []
Current: node script.js valid => [] [valid]
Expected: node script.js valid --first => [valid, --first] []
Current: node script.js valid --first => [] [valid, --first]
Expected: node script.js --first => [--first] []
Current: node script.js --first => [--first] []
Expected: node script.js valid --what --first => [valid] [--what, --first]
Current: node script.js valid --what --first => [valid, --what, --first] []
Expected: node script.js valid command => [valid, command] []
Current: node script.js valid command => [] [valid, command]
Expected: node script.js command => [] [command]
Current: node script.js command => [] [command]
^That one works purely by accident, but it is an important aspect of the functionality - only valid commands should be accepted.
Expected: node script.js valid bar => [valid] [bar]
Current: node script.js valid bar => [] [valid, bar]
If the examples aren't sufficient, I can try to simplify the script I'm currently working with to a usable (and sharable - it's internal code, so I can't post it publicly) version, but that would take quite a while (it's a process automation script that's pushing half a megabyte of code across 40-ish files).
@Toberumono on my mind, is given these examples, something close to @kherock's patch that populates "_" rather than "--" for unknown options and positionals, rather than "--".
If we could extend this fix to address all your examples, would we be on the right track?
The problem is definitely linked to populate--
since yargs temporarily forces it on in order to parse commands:
I also can't replicate the description of the expected behavior exactly - there's no config where --first
is sent to the "leftover tokens" list. This is the script I'm using to test on version 21.0.1:
import yargs from 'yargs';
const parse = (argv) => yargs(argv)
.parserConfiguration({ 'unknown-options-as-args': true, 'halt-at-non-option': true })
.option('first', { type: 'boolean' })
.option('second', { type: 'boolean' })
.option('third', { type: 'boolean' })
.command('$0', 'no command', () => { }, argv => console.log(argv))
.command(
'valid',
'A command',
yargs => yargs
.command('command', 'A valid command', () => { }, argv => console.log('valid command', argv))
.command('thing', 'A valid command', () => { }, argv => console.log('valid thing', argv)),
argv => console.log('valid', argv),
)
.command(
'foo',
'A command',
yargs => yargs
.command('bar', 'A foo command', {}, argv => console.log('foo bar', argv)),
argv => console.log('foo', argv),
)
.command(
'bar',
'A command',
() => { },
argv => console.log('bar', argv)
)
.parse();
parse(['valid']); // valid { _: [ 'valid' ], '$0': 'script.js' }
parse(['valid', '--first']); // valid { _: [ 'valid' ], first: true, '$0': 'script.js' }
parse(['--first']); // { _: [], first: true, '$0': 'script.js' }
parse(['valid', '--what', '--first']); // valid { _: [ 'valid', '--what' ], first: true, '$0': 'script.js' }
parse(['valid', 'command']); // valid command { _: [ 'valid', 'command' ], '$0': 'script.js' }
parse(['command']); // { _: [ 'command' ], '$0': 'script.js' }
parse(['valid', 'bar']); // valid { _: [ 'valid', 'bar' ], '$0': 'script.js' }
Then when I change unknown-options-as-args
to false, subcommands fail to parse (and this is identical to the v21.1 output):
// parserConfiguration({ 'unknown-options-as-args': false, 'halt-at-non-option': true })
{ _: [ 'valid' ], '$0': 'script.js' }
{ _: [ 'valid', '--first' ], '$0': 'script.js' }
{ _: [], first: true, '$0': 'script.js' }
{ _: [ 'valid', '--what', '--first' ], '$0': 'script.js' }
{ _: [ 'valid', 'command' ], '$0': 'script.js' }
{ _: [ 'command' ], '$0': 'script.js' }
{ _: [ 'valid', 'bar' ], '$0': 'script.js' }
I realize that in an ideal world, the logic governing unknown-options-as-args
would be able to understand when a "command" is known, but this is simply not possible with yargs-parser currently.
Lastly, additionally turning off halt-at-non-option
restores the original output aside from --what
being parsed.
Hopefully we can agree that the configuration being discussed here still has a bug pre-v21.1 where halt-at-non-option
isn't working properly—and when it is working properly, populate--
forced by yargs interferes with its operation. There should be a solution out there that should fixes both of these issues. I see a few options
halt-at-non-option
in yargs itself. An improvement could be to have yargs look at the contents of '--'
for a matching subcommand. If a command matches, then the rest of the args are parsed again and merged into the result.
valid -- command --first
is indistinguishable from valid command --first
_
regardless of the populate--
setting. Then extend the yargs config with a tree of commands that serve as exceptions to the halt-at-non-option
rule.
{
'unknown-options-as-args': true,
'halt-at-non-option': {
exclude: {
valid: ['command', 'thing'],
foo: ['bar'],
bar: [],
}
}
}
halt-at-non-option
that works better with subcommands. This option would push args to _
while some criterion is met, sending the rest to --
when populate--
is true
{ 'halt-at-unknown-command': ['valid', 'foo', 'bar'] }
could define the first level of commands that should be recognized as known args. Consumers (such as yargs) would then send the leftover tokens in --
to the next subcommand, which may or may not specify a new set of subcommands in the parser config. yargs could integrate with this option directly by always passing it a list of registered command handlers when 'halt-at-unknown-command' is set to true.@bcoe If we could extend the fix, it would be on the right track; however, as I stated on that same PR, I'm concerned that this is just shunting the confusing behavior of these parameters from one spot to another. I'm assuming that there is a reason that yargs cannot provide yargs-parser with a tree of legal commands? In the event that this is in part an issue of time vs number of people actually helped, I'd be happy to contribute the code for this feature (honestly, I'd be happy to contribute the code regardless - I just don't want to put a bunch of work in on a feature that doesn't have a chance of being integrated).
@kherock
At this point, this "issue" should be considered a feature request for the logic governing token validity (unknown-options-as-args
and halt-at-non-option
in the preceding examples) to understand when something is a command.
For that feature, I think that option 3 would be the best. The caveat in option 1 is something that I'm currently dealing with in my code, and isn't necessarily rocket science to account for, but would be nice to avoid. The same concerns I had with changing the behavior of populate--
still apply with option 2.
My initial idea for how to implement it was to provide a valid command tree, but given the descriptions of how yargs works under the hood, I doubt that a prefix tree starting from root would be possible.
@Toberumono
I'm assuming that there is a reason that yargs cannot provide yargs-parser with a tree of legal commands?
Commands are lazy loaded, so for nested commands (the way yargs is designed) there's no way to know the full set prior to calling parse.
Also, since order matters command-1 command-2
is different than command-2 commnad-1
, just knowing the set of commands isn't sufficient to decide whether or not arguments are valid.
This aside, I like the separation of concerns that commands live in yargs
, and are a layer of logic that lives above yargs-parser
-- yargs-paser
knows how to tokenize an array of arguments into tokens, but doesn't know about the advanced features yargs
facilitates, e.g., help output, command handling.
If you submit a few failing tests to yargs
(rather than yargs-parser
), which pass prior to 21.1.0
, but fail in 21.1.0
, I think this would be a good starting point to figure out a fix that supports your old behavior.
If you centre these tests around some real world user behavior, e.g., the types of things people would enter into your command line app, then it will help determine if there is additional functionality we should add to yargs
or yargs-parser
, e.g., an allow-list of positionals.
@bcoe RE commands being lazy loaded: I figured that that would be the case. And yes, if the only realistic way to get stuff in is via a set rather than a tree (not TreeMap, but a directed acyclic graph), then the issue isn't solvable without removing the separation of concerns.
RE separation of functionality: I agree on that front. The main reason that I suggested providing a prefix tree of tokens was that it avoids adding the full concept of commands to yargs-parser (to be clear, I'm not recommending going that route here - given commands are lazy-loaded, pre-computing a prefix tree isn't an option).
RE the more general stuff / next steps: At this point I've handled issues that I encountered through more rigorously defining commands that are allowed to have arbitrary arguments and extracting "--" through a bit of preprocessing (I was already futzing with the arguments list to get help displaying in a manner more semantically logical to my coworkers). I highly doubt that any of this is worth making drastic changes to yargs (or any related libraries) to support, and it sounds like the changes would need to be quite drastic indeed.
Regardless, I'm going to be having to switch away to some other stuff for the time being (technically I already have). If I stumble on a more widely relevant use case, I'll create some demonstrative tests.
@Toberumono sorry for the pain in the neck -- unfortunately config options aren't tested together rigorously, regressions like this have bit people like this a few times.
As of version 21.1.0, configuring yargs-parser with
{"populate--": true, "unknown-options-as-args": true, "halt-at-non-option": true}
results in every non-flag being put in "--" regardless of whether it is part of a valid command. Prior to version 21.1.0, non-flags were excluded from "--" provided that they were part of a valid command. Example Command: node foo.js validCommand Expected behavior: executes the command "validCommand" Actual: places "validCommand" in the "--" array.Effectively, it is treating every command as invalid.
Unfortunately, I cannot build directly because version 21.1.0's type bindings are broken, and I'm using TypeScript, so I can't dig into it properly at this time.