Quotes not consumed - Githubissues

cdhowie commented 5 years ago

Version: 13.1.0

See also #93 and #138 which both apparently conflict in their expectations of what should happen to quotes.

When I run parse('a "b c"') I expect the quotes to be consumed, as happens with most proper shells, but this is not what happens:

> parse('a "b c"')
{ _: [ 'a', '"b c"' ] }

This expectation appears to be different for different people and I'm not sure if this actually is a bug, but it's not at all what I expected it happen. Is it possible that we could get an option for how quotes are to be treated?

Here's what I expect to happen based on the behavior of sane shells:

a "b c" becomes ['a', 'b c']
a b\ c becomes ['a', 'b c']
a '"b c"' becomes ['a', '"b c"']
a \"b c\" becomes ['a', '"b', 'c"']

yargs-parser does something different in all cases. The last one in particular is most surprising.

> parse('a "b c"')
{ _: [ 'a', '"b c"' ] }
> parse('a b\\ c')
{ _: [ 'a', 'b\\', 'c' ] }
> parse('a \'"b c"\'')
{ _: [ 'a', '\'"b c"\'' ] }
> parse('a \\"b c\\"')
{ _: [ 'a', '\\"b c\\"' ] }

There doesn't seem to be any way to get a literal space into an argument without quotes. If yargs leaves the quotes in, then suddenly it's the responsibility of my application to understand that quotes might be present and strip them. This means my application is doing the work of the parser, and that feels very, very wrong -- otherwise, why am I using yargs?

bcoe commented 5 years ago

@cdhowie your reasoning seems correct regarding some of these edge-cases.

But I think some of these issues would exist in yargs-parser but not yargs. yargs is passed arguments after they've already been processed by the operating system's shell.

As an example:

yargs-app -c "foo and bar"

☝️ by the time yargs gets process.argv for processing, a shell like bash will have stripped the quotes. The same goes for if we were to try to process \.

I wouldn't be at all shocked if there are still some edge-cases around quotes, but I'd test the behavior with an integration test through bash, not with strings passed into yargs-parser (since some of these edge-cases would be in practice impossible with bash).

If we do find some edge-cases around quote handling still, would very much appreciate help addressing them.

cdhowie commented 5 years ago

What is the recommended way to deal with non-shell use-cases then? For example, I am using this as the basis for a chat bot with multiple commands, sub-commands, etc. Yargs is a great tool for this use case.

However, I can't just toss a string at yargs and have it do the right thing because it doesn't strip quotes. I had to write my own JavaScript library that applies bash-style quoting rules to the string and returns an array. Yargs can correctly process the array, because I've already taken care of consuming quotes.

This feels like something that yargs-parser should be able to do when given a string (but not an array, of course).

FWIW, here are my library's test cases. I would expect yargs-parser to apply the same rules to a single string argument.

    function test(line, args) {
        it(`parses: [${line}]`, function () {
            expect(parse(line)).toEqual(args);
        });
    }

    function testFail(line) {
        it(`throws while parsing: ${line}`, function () {
            expect(() => parse(line)).toThrow();
        });
    }

    test('ab cd ef', ['ab', 'cd', 'ef']);
    test('ab cd ef ', ['ab', 'cd', 'ef']);
    test('  ab  cd  ef  ', ['ab', 'cd', 'ef']);
    test('ab "cd ef"', ['ab', 'cd ef']);
    test('ab cd\\ ef', ['ab', 'cd ef']);
    test('ab\\ncd', ['abncd']);
    test('a "doin\' a test" b', ['a', "doin' a test", 'b']);
    test('a "" b ""', ['a', '', 'b', '']);
    test("'a'\\''b'", ["a'b"]);
    test("'foo '\" bar\"", ['foo  bar']);
    test('', []);

    testFail('ab\\');
    testFail('a "bc');
    testFail("a 'bc");

bcoe commented 5 years ago

@cdhowie I wonder if it would be worth adding an additional function to yargs-parser, that performs the same pre-processing that bash would apply, then folks using it for a chat bot could just pass the string through this first, but we wouldn't have a push and pull between yargs features and yargs-parser features.

cdhowie commented 5 years ago

That seems a reasonable compromise. If you would like, I can supply the code that I've written to accomplish this.

bcoe commented 5 years ago

@cdhowie this sounds reasonable to me, perhaps we could call the method something like, toArgvFormat, which would take a string --foo "bar hello" "testing \" this", and return an array of the format:

["--foo", "bar hello", "testing \" this"]

👆 this could then be passed to yargs-parser.

cdhowie commented 5 years ago

I have published the module that I wrote for this kind of parsing: https://www.npmjs.com/package/shell-parser

Feel free to pull this as a dependency or just take the code from index.js if you don't want another dep.

Apollon77 commented 5 years ago

I have the case with a command line like

./iobroker state set javascript.0.testString "Hello World"

Where the "hello world" is cutted into two params with 14.0.0 ... Why you not incorporate that shellparser-approach?

Apollon77 commented 4 years ago

DO we have any chance to get that fixed because it is really anoying :-(

bcoe commented 4 years ago

@Apollon77 could you provide a reproduction in yargs? most shells strip quotes before populating them in process.argv, see:

example.js

console.info(process.argv);

Will output:

bencoe$ node example.js "hello world"
[
  '/Users/bencoe/.nvm/versions/node/v12.15.0/bin/node',
  '/Users/bencoe/oss/test-release-please-standalone/example.js',
  'hello world'
]

Yargs has the same behavior:

example2.js:

const yargs = require("yargs")
console.info(yargs.parse());

Will output the following in bash:

 bencoe$ node example.js "hello world"
{ _: [ 'hello world' ], '$0': 'example.js' }

cdhowie commented 4 years ago

Edit: I missed the context of who you were replying to. Nonetheless, I feel like this is a better summary of the issue than what I originally wrote so I will leave it here as better documentation of my rationale for requesting this change.

@bcoe Indeed, I would not expect yargs to strip anything out when processing an array. However, this issue is about processing a string, not an array:

> const yargs = require('yargs');
undefined
> yargs.parse('a "b c"');
{ _: [ 'a', '"b c"' ], '$0': '' }

Here, I would expect the return value to be { _: [ 'a', 'b c' ], '$0': '' }.

The parser grouped b c into a single argument, but did not strip the quotes. This is very unusual, at least to me -- clearly it understood that they were quoted and therefore meant to represent a single argument, but the quotes were left there.

When invoking this API, I expected one of two results:

['a', 'b c']: The parser saw the quotes, understood that the intended meaning was to provide a second argument with a space in it, and consumed the quotes during parsing.
['a', '"b', 'c"']: The parser does not understand quotes and just split on whitespace.

Either of these approaches are fine, IMO...

But the actual result ['a', '"b c"'] makes absolutely no sense to me and I can't imagine a scenario where it would be expected. It seems to me the quotes should be understood and removed appropriately, or they shouldn't even be recognized.

Instead, we have this case where they are kind of half-processed. When I saw this result, I had to double-check that the input I was feeding yargs was in fact a "b c" and not something like a '"b c"'. My POLA-violation detector was alarming loudly.

Now my code that actually uses the arguments has to account for multiple scenarios, and I can't be sure which actually happened:

Was the input parsed from an array? Then the values are after shell processing and so the quotes are meaningful, and escaped appropriately, and I should not remove them.
Was the input parsed from a string? Then quotes might need to be removed as they were put there to quote a series of characters as a single argument.

And this all assumes that I tell downstream code where I got the arguments from, which shouldn't matter in the first place.

My workaround is to never give yargs a string, because I can't trust that the output makes sense, and it severely complicates the rest of my code. Instead, I wrote my own routine to do POSIX-shell-like parsing of a string into an array, which is then safe to give to yargs.

Mandalorian007 commented 4 years ago

@cdhowie I just hit this issue using yargs for a discord bot actually. I would love to get baseline support for this into yargs. That being said I want to leverage your library to get things working now.

After I execute a shellParser(myArgsParam) I get an array. The yargs.parse method wants a string. Is there something I am misunderstanding on how to use your library with Yargs?

const yargs = require('yargs');
const shellParser = require('shell-parser');

const parser = yargs
  .scriptName(config.prefix)
  .usage('$0 [command] [options]')
  .commandDir('commands')
  .demandCommand(1)
  .help(false)
  .showHelpOnFail(false)
  .showHelpOnFail(true)
  .version(false);

parser.parse(shellParser(originalArgs), async (err, argv, output) => {
      if (argv.handled) {
        const result = await argv.handled;
        await message.channel.send(result).catch(console.error);
      } else {
        await message.channel.send(output);
      }
    });

I apologize if I made any mistakes in terminology Javascript is not my primary language.

Mandalorian007 commented 4 years ago

As a follow up I was able to get the desired behavior with the "greedy-array" configuration set and changing my variable to an array type and then joining by space.

.option('g', {
      alias: 'guild',
      demandOption: true,
      describe: 'Specify the guild your character is currently in.',
      type: 'array'
    })

cdhowie commented 4 years ago

@Mandalorian007 My use case is also a Discord bot. :)

After I execute a shellParser(myArgsParam) I get an array. The yargs.parse method wants a string. Is there something I am misunderstanding on how to use your library with Yargs?

yargs.parse() wants either a string or an array, so the code in your first question should work just fine -- if it doesn't, what problem are you having?

See the docs (emphasis mine):

Parse args instead of process.argv. Returns the argv object. args may either be a pre-processed argv array, or a raw argument string.

yargs / yargs-parser

Quotes not consumed #180