yargs / yargs-parser

:muscle: the mighty option parser used by yargs
http://yargs.js.org/
ISC License
496 stars 118 forks source link

Create parser builder pipeline. #302

Open Mike111177 opened 4 years ago

Mike111177 commented 4 years ago

Currently the only way to configure yargs-parser is by the following. (Excuse my mixed syntax)

require('yargs-parser")(args, opts) => Arguments

I propose creating a builder pipeline that allows you to preconfigure your opts before running the command by adding an opts function to the Parser interface

export interface Parser {
   //other stuff
   opts: (opts: Partial<Options>) => Parser;
}

This has a bunch of advantages:

  1. Allows programmer to not keep track of opts variable when used multiple times.

    const parse = require('yargs-parser').opts({...})
    function myfunc1(){parse("no opts variable here")}
    function myfunc2(){parse("no opts variable here")}
  2. Allows parser to check options ahead of time. This means that if after you create your parser with opts, yargs-parser doesn't have to do any of the setup of validation twice. So if you do myParser("mystring") there is no need to any of the options pre-processing twice. The first time the args are even referenced on the parser function are like 150 lines of code in: https://github.com/yargs/yargs-parser/blob/5f987aae9c7e7781e54c008b8cff01b210a7a19a/lib/yargs-parser.ts#L203 Since this is not a breaking change you can also now throw an exception on checkConfiguration() errors, since this is a new opt-in feature anyway.

  3. Really good interop with ideas from #300 if considered:

    
    //Oldway
    const oldway = require('yargs-parser)
    oldway("my command", {opts...})
    //Can't throw opts error because that's breaking change
    //Warning only happens when parser is used, instead of when options are defined

//To make a preconfigured bashlike parser const parse = require('yargs-parser').opts({...}).bash //Returns (arg: string, opts?: Partial) =>Arguments //Can immediately throw if your options are invalid, faster debugging

parse("myweird option with a literal \\" in it") //Is able to skip option validation since it was already done.

parse("myweird option with a literal \\" in it", {additional_opts}) //Can't skip validation, but original options will still have thrown error if malformed

parse(["my", "bunch", "of", "tokens"]) //Compiler error: bash tokenizer expects string, tokenizing tokens makes no sense



These are just a couple of thoughts. Other potential things that could come from this is some of the functionality of yargs being possible in yargs parser, maybe in the future the idea of commands.
bcoe commented 4 years ago

These are just a couple of thoughts. Other potential things that could come from this is some of the functionality of yargs being possible in yargs parser, maybe in the future the idea of commands.

@Mike111177 I appreciate you taking the time to write this up 👍

The goal of the yargs-parser library is to be a (relatively) lightweight library, that given process.argv provides are parsed version, which a higher level library like yargs or meow (with both use the library) can build on top of.

I think if we were to extend too much on the API surface like this, we start to blur the line too much between what the goals are of yargs, vs., the goals of yargs-parser.

I think a good way to think of it might be the responsibilities of Node.js, vs., v8.

Mike111177 commented 4 years ago

Hmm, I kinda regret not including #300 in this issue, because this is kind of a response to both. I'll be quoting stuff from there too, I don't know if it is possible to merge issues.

@bcoe I totally agree with keeping the functionality of yargs, and yargs-parser separate and we'll defined. My concern is that the current responsibilities of yargs-parser is already pretty muddy. It serves as two roles currently:

  1. Given a token array (usually passed via yargs) create an object that contains the arguments.
  2. When used as a stand-alone package, tokenize a string and then feed into role 1.

Because most shells differ in behavior from yargs-parser tokenizer this can cause unexpected results from the user.

In an ideal world yargs parser would be split into two packages, something like yargs-tokenizer, and yargs-interpreter. This would make it so the yargs package doesn't need to include the yargs-tokenizer package, making yargs more lightweight. It would only need to include yargs-interpreter, which would be what yargs-parser is sans tokenizer. Obviously this would break backwards compatibility. But you could also for legacy support have yargs parser just include both the interpreter and the tokenizer. You would have a dependency tree that looks like this (including meow): Capture

300

I'm a proponent of decomposing libraries, but have definitely gotten some flack for yargs' footprint.

Honestly, in my opinion, it requires no extra effort to install a package with dependencies, especially if you are also the maintainer of that dependency. The only real argument I can think of against it is that maybe if you are using some preemptable cloud VM thing where small code footprint is critical for speed of download during initialization. And if that is the problem then why wouldn't you be webpacking your app anyway to minimize the size?

I think that with this method, you can both reduce the logic complexity and code footprint of yargs, as well as expand the functionality of yargs-parser, giving it a stricter conformance and improve user expectations.

Ultimately, in order to maximize the flexibility and user control while minimizing code footprint (and maintaining backwards compat) I think you need a 3 stage pipeline in which any component can be swapped out. image

bcoe commented 4 years ago

@Mike111177 I appreciate these detailed posts 👏

To manage expectations, a complete overhaul of the yargs-parser library is not work that this project has the cycles to take on right now (or to help someone take on):


Where I would rather start would to be to see if we could make the existing tokenizer a bit smarter, to address some of the bugs you bring up in #300:

Perhaps it will become clear that we can move some logic from the parser into the tokenizer, through looking at patterns in the bugs.

I would also be interested in seeing if we could simplify the logic in yargs-parser.ts, e.g., could we make the handling of arrays, dot notation, etc., handled in a more abstract way.

If I see a few incremental wins, that make the library more malleable, I'm certainly going to be more supportive of larger refactors in the future.

bcoe commented 4 years ago

@Mike111177 I'm not trying to discourage you, and would appreciate and support your contributions 🎉

I would just love for us to start with some initial wins, perhaps we can get a test suite in place for some other shells (using GitHub actions), this would be a great starting point.

Mike111177 commented 4 years ago

Ok, I think what I will do then is more precisely document each tokenizer (That being yargs and the various shells). This way at a minimum we can try to enforce the yargs tokenizer can emulate all of the common behavior of the four most common shells (bash, cmd, powers hell, and sh). This should also allow us to determine better what is the responsibility of the tokenizer vs the parser. Currently I am not totally certain on how to do the tests against shells it in a CI friendly manner (See #300), but with more precise documentation on the differences between existing tokenizers, we can make more precise tests.

Once that is done, we can determine the token style varients and generalize syntax for the parser.

bcoe commented 4 years ago

@Mike111177 that sounds like a great start to me.

Mike111177 commented 4 years ago

@bcoe should I copy #300 into here and close #300? These two issues are a lot more coupled than I originally realized.

bcoe commented 4 years ago

@bcoe should I copy #300 into here and close #300? These two issues are a lot more coupled than I originally realized.

Yes, let's combine the conversations.