nodejs / tooling

Advancing Node.js as a framework for writing great tools
169 stars 17 forks source link

argument parsing #19

Closed boneskull closed 2 years ago

boneskull commented 5 years ago

Node.js can do better than process.argv.slice(2).

I'd like to discuss what the scope of something like this should be. Just a couple notes from my head:

Prior art:

(team: feel free to add more links to examples)

guybedford commented 5 years ago

Another example - https://www.npmjs.com/package/arg

iansu commented 5 years ago

Continuing the discussion from the Tooling WG meeting, I think there are two main usecases we should try to address:

  1. Provide an API that libraries like yargs, commander, etc. could use to simplify their implementation. For example, parsing arguments into a key/value structure.
  2. Provide something that end users could use to do simple argument parsing. Again, maybe just turning args in keys and values and maybe something to generate basic help/usage output.
vweevers commented 5 years ago

Are there meeting notes available that explain the problem you're aiming to solve? Because even parsing arguments into a key/value structure is opinionated - see e.g. subarg - and although "those who want something different don't need to use it" answers that, I'm left wondering what node (core) could do better than userland.

boneskull commented 5 years ago

Yes, I stopped taking notes when I started talking about it, unfortunately.

sam-github commented 5 years ago

I'm also wondering what node core could do better than userland.

Node.js can do better than process.argv.slice(2).

There are lots of modules that do better than that already, and there has been continuous innovation/evolution in those modules over the last years. If node blessed a version, it would threaten to suck the oxygen out of the ecosystem, which looks (to me) to be meeting this need pretty well already.

I hope we don't have stdlib envy, because while "language X has an options parser in their stdlib" is true for many X, most of the options parsers in language standard libs are ossified. They are there, but wise developers use better versions, like the ones linked to in the original post under "standard art".

refack commented 5 years ago

👍 I had a similar thought while working with python on GYP.

I hope we don't have stdlib envy ... They are there, but wise developers use better versions, like the ones linked to in the original post under "standard art".

Yes, I've got stdlib envy WRT to tooling enablement. @sam-github, you raise good points (e.g. ossification, and ecosystem stifling), but in the big picture discussion I'm happy to add enthusiasm to the "pro" column.

I was thinking about creative possible solution to this ambivalence... One idea was modular stdlib, for instance have configuration for tooling, and one for web-servers. Another could be a curated meta-package, like for example Debian's build-essentials.

sam-github commented 5 years ago

I've had the opportunity of watching some of those stdlibs (specifically C, C++, perl, ruby, python, and lua, in that order) go from "new and wonderful" to "what the h**l were they thinking?", and I personally love that npm allows a micro choice of the APIs I want to use.

Node's EventEmitter & stream style APIs were recently called "the laughing stock of the internet because they don't do promises" in a conversation I saw. Fair enough, they don't do promises, but they PREDATE promises. The less you build in, the more easily you can push things aside as they become dated so new devs don't pick it by accident.

C/python/etc. getopt, I'm looking at you.

And optparse was hot for a while, but then argparse became the cool thing.

There are radically different opinions on how to do opts parsing, looks like shark-infested waters to me.

ljharb commented 5 years ago

As much as I'd like my preferred argument parsing pattern to be in node, I would be a thousand times more horrified to have one of the other argument parsing patterns in node :-/

vweevers commented 5 years ago

I use different parsers at different times, they are all great and I don't want any of them in node 😄

boneskull commented 5 years ago

I'm feeling like we're a little too worried about what people will think of the API without actually proposing anything.

I think we need to roll it back a bit...

If you're commenting on this issue, I'm going to assume:

  1. You care about enabling authors of CLI tools to have a better developer experience.
  2. Node.js has much room for improvement in terms of features and support for tooling authors.
  3. You realize that to do this, Node.js will necessarily need to add features where and when it makes sense to correct for this.
  4. Handling command-line arguments is fundamental to writing useful CLI tools.

Right? Great!

My thesis, then, is this:

The lack of an argument-parsing API in Node.js negatively impacts the developer experience. This is why:

I don't expect everybody to agree on the thesis, but it's where I'm coming from.

So, instead of a snowball fight over an API for declaring types, defaults, "options" vs. "commands", help text, etc., I'm hoping we can move the conversation in this direction: let's identify tasks common to all (mostly all?) argument-parsing implementations.

Off the top of my head:

  1. Consumers declare a set of "keys", defined using strings, which will be considered "input" to the program.
  2. The parser implementation iterates over an Array (generally process.argv.slice(2)) and recognizes its items as keys and/or corresponding values.
  3. Unknown items are considered: put into a bucket, discarded, or disallowed.
  4. The implementation outputs the key/value pairs in a more convenient/appropriate data structure, such as an Object or Map.

The above parser would be useful for simple parsing, or to build on top of. Given the reaction to this issue, I'm happy to rein in the ambition, and avoid opinionated strategies as much as possible. Even the smallest set of functionality would be better than what we currently have!

sam-github commented 5 years ago

I think some of the comments here aren't about the API per se, its about the thesis. I don't agree with it, sorry, but maybe other people do. Not sure.

Even the smallest set of functionality would be better than what we currently have!

And I absolutely don't agree with this. Node.js clearly has no options parser, so CLI authors look for them in npm, and find a number of fine ones, and some terrible ones. However, there might be vehement disagreement about which ones are fine and which ones are terrible.

A small set of options parsing functionality would be worse, because it would delay people going to npm to get a good options parser, and open up the Node.js API for criticism because it has a bad options parser (and given how opinionated options parsing is, its likely to be considered bad by a fair chunk of people).

If this thread was full of unhappy users of options parsers, bemoaning their terrible state and how it wasn't possible to build a decent CLI in node because they were all crap, I'd be right on board with doing it right. But that seems pretty far from the case.

So, I remain uncertain about the problem that is being solved.

If the thesis of the tooling WG is "it should be possible to build high quality CLIs using just the node stdlib", then I missed that. I don't agree, but I will stop commenting, because I'd know I'm working at cross purposes, and that's not helpful.

Btw, process.argv didn't inspire WTF in me, it made me think of ARGV in ruby, of sys.argv in python, in char* argv[] in C, of .... you get the idea.

mcollina commented 5 years ago

I share the same concerns of @sam-github.

At this point of Node.js history, it's probably better to have a piece of docs on the nodejs website that points to popular modules in npm.

arcanis commented 5 years ago

Some data point - I took the repository from a known open-source project, and its resolution table contains:

All those have different majors that package managers aren't allowed to optimize. It's not a huge issue in the grand scheme of things, and a Node api won't make them disappear overnight, but the lack of it clearly has an impact on our projects.

vweevers commented 5 years ago

@arcanis In any case that has to be solved in that project, by consolidating everything to use the same parser. Whether that parser is yargs, minimist or a node core API, the effort would be the same.

arcanis commented 5 years ago

@vweevers The problem isn't caused by this project in particular, but by its dependencies. My point is that babel, eslint, webpack, prettier, jest, lerna, uglify, ... all those use different CLI parsing libraries which duplicate a lot of logic for very little reason.

In this context maybe a unified Node API would increase the incentive to use a standardized logic (or maybe they would continue doing it for the sake of customizing their CLI, it's hard to tell).

sam-github commented 5 years ago

Whether https://github.com/nodejs/tooling/issues/19#issuecomment-468233383 is a problem or not is a matter of debate, but it is not specific to args parsing. I'd say your stats show how often the community arg parsers are used, and that people have chosen to use different ones, with features specific to their liking.

Also, you could do the same analysis for lodash, debug, request, or for many other sets of popularly depended upon modules. Its gotten better with npm hoisting of deps, but still happens a lot. Its also possible to get many copies of sub-deps with identical versions, depending on exact form of the dep tree.

We get "dependency hell" in some languages, where direct dependencies want conflicting versions of their sub-deps and we can't install. With node/npm, we get multiple versions of sub-deps. They both have down-sides, but I'll take npm's approach over dependency hell.

boneskull commented 5 years ago

If the thesis of the tooling WG is "it should be possible to build high quality CLIs using just the node stdlib", then I missed that. I don't agree, but I will stop commenting, because I'd know I'm working at cross purposes, and that's not helpful.

I'm going to guess "it should be possible to build high quality CLIs using just the node stdlib" means something different to you than it does to me.

"Coercing arguments into a more appropriate data structure"--which was my "barebones" suggestion--is not enough to build a "high-quality" CLI, IMO.

It does, however, offer a minimal set of functionality that many users will be able to consume directly and enables creation of more and better higher-level CLI libs than already exist in userland.

By adding the basics to core, we would make the simple case easy to implement without having to pull in userland modules (which may be further inconvenienced by Enterprise Process). And we'd enable those seeking to create their own higher-level libraries by reducing overhead, boilerplate, and lowering the bar.

At this point of Node.js history, it's probably better to have a piece of docs on the nodejs website that points to popular modules in npm.

@mcollina Can you clarify this? Unsure if you're talking about this particular issue or more generally.

boneskull commented 5 years ago

I've done some initial comparison research on the strategies used by "popular" argument parsers.

I'm not drawing any conclusions from this right now, but here it is:


Argument Parser Analysis

Modules by Popularity

Popularity is defined by npms.io. I used the search term arguments to find most of these, then tried options when I realized commander did not appear in my first search.

  1. yargs-parser
  2. commander
  3. minimist
  4. argparse
  5. coa
  6. command-line-args
  7. arg
  8. mri
  9. bossy

At some point, I decided other modules were not popular enough, and stopped looking.

Excluded:

Comparsion Matrix

All modules evaluated:

= means that the module supports the form --foo=bar.

Module Positional Commands Aliases Defaults Required =
yargs-parser 1 1 1 1 1 1
commander 1 1 1 1 1 1
minimist 1 0 1 1 0 1
argparse 1 1 1 1 1 1
coa 1 1 0 1 1
command-line-args 0 1 1 1 0
arg 1 0 1 0 1
mri 0 1 1 0 1
bossy 0 0 1 1

α: Supports one alias β: Via user-supplied handler function γ: After --, e.g., ./executable --foo bar -- something δ: Unclear

Types Comparison

All modules evaluated support these types:

Module Numeric Count Variadic
yargs-parser 1 1 1
commander
minimist 0 0 0
argparse 1 1
coa 0 0 1
command-line-args 1 1 1
arg 1 1 1
mri 0 0 1
bossy 1 0 0

α: Discrete types for "float" and "integer" β: Via user-supplied handler function

boneskull commented 5 years ago

Corrections appreciated :smile:

sam-github commented 5 years ago

https://www.npmjs.com/package/posix-getopt is my favourite. I started using it after reviewing a number of the above, and found them way too heavyweight (commander), or flat out incorrect.

Every one that doesn't accept configuration as to whether an option (long or short) takes an arg or not is incapable of parsing a command line correctly, because it can't tell if a - is the start of a new option/switch, or the argument to the last (EDIT: or whether the option takes an arg). I suggest adding this to your criteria.

minimist is one of the sinners: check out https://github.com/nodejs/branch-diff , see its docs, try its CLI

% branch-diff --simple master pr-22894
/home/sam/.nvm/versions/node/v11.12.0/lib/node_modules/branch-diff/branch-diff.js:194
      throw err
      ^

Error: Must supply two branch names to compare

WTH? Its because https://github.com/nodejs/branch-diff/blob/2d81c5a18b1e5d2a48dec85a7a739c3a14534e5b/branch-diff.js#L151 zero config means it assumes master is the argument to --simple. No zero-config cli parser can correctly parse an ARGV.

You might also add to your matrix whether the library properly allows short options to be combined: -a -b -c file same as -abc file, and whether it supports -- meaning end-of-options.

boneskull commented 5 years ago

I’m not sure I understand; minimist allows the consumer to declare whether an option should be considered a flag or it expects a value.

From what I could tell, few of them support combined short options. This suggests that most CLI authors aren’t using/expecting this functionality.

I think some context was missing here. The aim of this comparison is to discover what popular modules do in order to have a precedent for constraining the feature set.

boneskull commented 5 years ago

But I can go back and add whatever feature people think they want to see. It’s possible I missed something widely implemented!

sam-github commented 5 years ago

minimist allows the consumer to declare whether an option should be considered a flag or it expects a value.

I'm happy to be proven wrong, but I don't see such an option:

https://github.com/substack/minimist#var-argv--parseargsargs-opts

sam-github commented 5 years ago

From what I could tell, few of them support combined short options.

Maybe its irrelevant here, I don't mean to side-track this, but be aware that if they can't combine short options, they are incapable of implementing POSIX CLI standards: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html

Windows conventions aren't relevant, I've never seen a node tool that supports /h for help.

boneskull commented 5 years ago

yeah... it's not too relevant to the aim of the comparison, which was to determine what these libraries typically do support.

boneskull commented 5 years ago

I'm happy to be proven wrong, but I don't see such an option:

From your link:

opts.string - a string or array of strings argument names to always treat as strings

opts.boolean - a boolean, string or array of strings to always treat as booleans. if true will treat all double hyphenated arguments without equal signs as boolean (e.g. affects --foo, not -f or --foo=bar)

But maybe we're not making the same assumptions about what "boolean" and "string" arguments are.

sam-github commented 5 years ago

I completely read that differently, as allowing configuration of --opt=[string] and --opt=[boolean], but instead it makes boolean options not options, but flags. Oops!

isaacs commented 5 years ago

I really don't want argument parsing in core. The options in userland are too varied, and none of them is the clear "winner", because the problem space is so situation-specific.

I've used dashdash, yargs, minimist, and a few others in the past. I wrote nopt (which npm uses), which is basically a whole type system and extremely loose about what it accepts. I also wrote jackspeak (which tap uses), which is a much simpler type checker and extremely strict in what it accepts.

Yargs is basically like a MVC for command-line apps, which goes much further than parsing arguments, all the way into routing to different commands based on input, nesting different argument sets within those commands, and so on.

This is not like globbing or recursive directory removal, where the vast majority of the ecosystem uses one implementation that (while it could be improved upon) can serve as a satisfactory reference implementation. If yargs goes into core, core gets bigger, npm keeps using nopt, tap keeps using jackspeak. If jackspeak goes into core, nyc keeps using yargs, npm keeps using nopt, and core gets bigger.

We should take some time to recognize the significant difference in problem shape here. Just because "argument parsing" is a common need does not mean that all "argument parsing" needs are identical. We'd have to standardize as a community on how to handle shorthands, whether to support getopt-style shorthand concatenation, whether to require a = (or require that it not be there), and so on.

There is a reason why even in the unix command line space, with getopt specified in posix, so much of the unix userland extends or reimplements getopt. Let's not repeat that mistake!

In the unlikely event that 95% of the CLI tools in node are all using identical argument parsing semantics, and arg parsers compete only on efficiency of implementation, then pull it into core where it can be optimized for a specific version of node.

Until then, let arg parsing live in userland.

isaacs commented 5 years ago

More pragmatically, here is a rough top-of-the-head list of features that any argument parser needs to have an opinion about:

There are probably a few others. The challenge here is going to be that most people feel that the answers to these questions are obvious, but they all have different answers, so it's going to be an endless bikeshed that ends up in a compromise that satisfies no one. At least with process.argv.slice(2), we satisfy no one without having to do much work ;)

What would be nice, and I think we could get very broad consensus around, would be a shorthand that means you don't have to slice. process.mainArgs or something, to match process.execArgs? That should be everything after the main filename, such that these two programs return the same result:

// cmd.js
console.log(process.mainArgs)
node cmd.js a b c
node -e 'console.log(process.mainArgs)' a b c

Right now, pulling arguments out of node -e commands always breaks my brain a little, so I have to stop and test before I can remember how to do it.

ruyadorno commented 4 years ago

@boneskull I was quickly going over the youtube record of the last meeting and I realized I haven't given you a straightforward answer to your question on whether I want to move forwards with replacing the mainArgs proposal with an actual more robust arg parser API rather than trying to build one and than the other.

To that I say yes 👍 I'm fine with dropping the mainArgs idea and have the group focus in a single proposal for a proper argument parser API 😊

dominykas commented 4 years ago

Doesn't node already have an args parser built in? It has to parse it's own options, right? Can that be exposed somehow and that's that, or is it not generic enough? Can it be made generic enough? Because having a different way to parse args than that could be pretty confusing?

sam-github commented 4 years ago

I like where you are going with that, reusing what we have, but it won't work. It's in c++, not very featureful (won't combine single char switches, etc.), and would be somewhat hard to expose. Also is a bit tuned to node's use-case, as-in, when options are added (in c++) there is a flag saying whether they are allowed by in NODE_OPTIONS.

dominykas commented 4 years ago

Maybe there's a case for making the c++ impl a bit more versatile and that would benefit node itself? Maybe single char switches are not necessary for the minimal usable version that's being discussed here ("better than process.argv.slice(2)")?

sam-github commented 4 years ago

@dominykas Perhaps you should take a look at the node options parser API, here's an example of its usage: https://github.com/nodejs/node/blob/master/src/node_options.cc#L274 If you think that an API that parses a C array of null terminated strings into c++ data types has some feature overlap with what's wanted here, you are seeing something that I'm not. But that's just me, running code is the best POC. You may want to examine @boneskull 's comparison matrix to see how node's features stack up to the others.

coreyfarrell commented 4 years ago

I'm not a fan of using C++ where it's not needed, this limits who can contribute as many people do not know C++ or are not comfortable making changes to it. Having the argument parsing in C++ also means that it cannot be copied to npm. IMO this is very important for adoption, it allows modules to use the core args parser API without dropping support for all existing versions of Node.js.

ruyadorno commented 2 years ago

This landed in https://github.com/nodejs/node/pull/42675

First released as an experimental API in Node.js 18.3.0 (Current) and on Node.js 16.17.0 (LTS).