feat: add parsed meta-data to returned properties

shadowspawn commented 2 years ago

Problem

We don't wish to offer extensive configuration knobs and dials for modifying parseArgs behaviour, but we don't currently return rich information for authors to implement features themselves.

See #84.

Solution

Return a property which contains an array of the parsed tokens detected in the args. This allows the author to inspect the meta-data to add additional checks to throw for repeated options (say), or reprocess the tokens entirely to add auto-arrays for repeated options (say).

The internal implementation uses the tokens to do the error checking and generate the values and positionals. "Eating our own dog food."

The extra details are opt-in and not returned by default.

Closes: #84

This includes some minor lint in the existing (upstream) documentation, including one which was independently fixed:

https://github.com/nodejs/node/pull/43332

Example

const { parseArgs } = require('@pkgjs/parseargs');
console.log(parseArgs(
    { tokens: true, strict: false, 
      options: { port: { short: 'p', type: 'string'} } }
));

% node index.js -p80 --foo run
{
  values: [Object: null prototype] { port: '80', foo: true },
  positionals: [ 'run' ],
  tokens: [
    {
      kind: 'option',
      name: 'port',
      rawName: '-p',
      index: 0,
      value: '80',
      inlineValue: true
    },
    {
      kind: 'option',
      name: 'foo',
      rawName: '--foo',
      index: 1,
      value: undefined,
      inlineValue: undefined
    },
    { kind: 'positional', index: 2, value: 'run' }
  ]
}

Examples Folder Experimental Only

I am not planning to commit the examples other than tokens.js in this PR. I'll submit them in separate PRs. I'll leave them in the PR until last minute as examples of using the tokens for custom processing.

bakkot commented 2 years ago

Nice! I like the approach of constructing the actual result by iterating over the elements. That actually maps to a thing a lot of parsers do, which is to separate tokenizing ("lexing") and parsing.

Only quibbles are with naming, in descending order of how strongly I feel:

I feel strongly that isShort should be named short, or some other thing that does not start with is.
I don't like parseElements because it's a misleading pun with parseArgs (where parseElements is using "parse" as a noun, but parseArgs is using it as a verb). elements doesn't have this problem. Other names are possible too, presumably.
I still think index makes more sense than argIndex, personally (I see that's what minargs went with too).

shadowspawn commented 2 years ago

Thanks. I'll revisit all three names.

shadowspawn commented 2 years ago

parseOptions alternatives: tokens, details, meta

shadowspawn commented 2 years ago

Returning tokens, with property index, and added example output to original comment.

bcoe commented 2 years ago

Only quibbles are with naming, in descending order of how strongly I feel:

I agree with @bakkot's quibbles.

Nice! I like the approach of constructing the actual result by iterating over the elements. That actually maps to a thing a lot of parsers do, which is to separate tokenizing ("lexing") and parsing.

I like this too, what I might do is break the process into two distinct methods, e.g., tokenize/parse. With tokenize having a pretty limited set of responsibility, mainly creating a list of tokens we know have meaning for our argument parsing.

shadowspawn commented 2 years ago

I like this too, what I might do is break the process into two distinct methods, e.g., tokenize/parse.

I split out the token generation which was satisfyingly a clean lift-and-shift. The parse is small enough, with called routines doing the heavy lifting, that I left it in place.

bcoe commented 2 years ago

I was wondering if we should require that a option be passed to enable parse details?

My thinking was that if a user just looks at the parse results, the information returned feels a little overwhelming, also it's an advanced feature that implementers can easily turn on.

shadowspawn commented 2 years ago

I agree. Not going to be needed or wanted by majority of users, and a lot of noise when console.log(parseArgs(...)) to see what it returns.

(It had crossed my mind, and I wish I had thought of it again before changing the unit tests last night. 😆 )

shadowspawn commented 2 years ago

I was wondering if we should require that a option be passed to enable parse details?

Added config.details. Further feedback welcome. (Are there other routines that behave like this?)

shadowspawn commented 2 years ago

I am playing with some of the challenges in #84 in examples, to exercise the feature as a library author.

NB: I am not planning to commit the examples in this PR! I'll submit them in separate PRs.

bakkot commented 2 years ago

Are there other routines that behave like this?

In Node? Not that I'm aware of offhand. But it's pretty common for parsers to have an opt-in for data like this. E.g. \@babel/parser takes a tokens option which attaches the underlying list of tokens to each parse node, acorn takes a locations option which attaches location information to each parse node, shift-parser [disclosure: I maintain this one] has a distinct parseWithLocation method which returns a locations side-table, etc.

aaronccasanova commented 2 years ago

+1 for tokens as the flag. To me, that feels very intuitive (especially after the recent rename of the return value from elements to tokens).

I know there has been a preference to avoid using the same names for inputs and outputs in the parseArgs API. So I should mention I'm also a fan of details as the flag 🙂

shadowspawn commented 2 years ago

Thanks for "tokens" examples @bakkot. I was interested to see alternative method parseWithLocation(), I had been wondering about another method. Feels heavier weight, but does have the advantage that the returned properties are stable for given method.

@aaronccasanova said:

I know there has been a preference to avoid using the same names for inputs and outputs

That is regular refrain from me! But I am leaning towards input configuration tokens over details. Edit: done.

bakkot commented 2 years ago

I had been wondering about another method. Feels heavier weight, but does have the advantage that the returned properties are stable for given method.

We mostly just did that because we didn't want the data to be attached to the nodes themselves (which isn't relevant here) but we already had parse method which returned the tree directly and so didn't have a place to return a new side-table. I think if we'd designed it from scratch we'd've just had a single method which returned either { tree } or { tree, location } depending on whether you passed location: true. I think that's what we should do here, rather than having a separate method; I'm not worried about returning extra properties based on the options.

bcoe commented 2 years ago

Are there other routines that behave like this?

The option withFileTypes comes to mind for fs. readdir:

https://nodejs.org/api/fs.html#fspromisesreaddirpath-options

Which returns a different shaped response for directory entries with additional details if requested.

shadowspawn commented 2 years ago

Edit: Default is stick with tokens as liked by @aaronccasanova , and @bakkot generally prefers shorter names.

Candidate names for the input configuration: tokens, withTokens, includeTokens?

const { values, positionals, tokens } = parseArgs({ strict:false, tokens: true });

const { values, positionals, tokens } = parseArgs({ strict:false, withTokens: true });

const { values, positionals, tokens } = parseArgs({ strict:false, includeTokens: true });

(I'm ok with any. My slight preference is includeTokens.)

(Edit: replaced returnTokens with later suggestion of includeTokens.)

See also https://github.com/pkgjs/parseargs/pull/129/files#r889698226 for later comments and prior art.

shadowspawn commented 2 years ago

Edit: default is stick with rawName.

@bakkot wrote:

I am still not in love with rawName as a name but it's the best idea I've got.

Likewise. Exploring...

-f comes from args like -f or -xfs or -fBAR --foo come from args like --foo or --foo=BAR

Alternatives to rawName: alias, used, shortOrLong, proxy, from, fromOption, source, sourceOption, id, identifier, usage, ref, reference, equivalent, dashed, dashedName

I wonder if -f | --foo is the "option" rather than the "name". So exploring that: option, optionUsed, fromOption, sourceOption, rawOption, bareOption

I think bareOption is interesting, as quite descriptive and different implication than raw.

shadowspawn commented 2 years ago

Edit: default is stick with no property for explicit short/long.

I'll also revisit a property that I removed (in part to avoid deciding name!), in case new insights.

@bakkot said:

I think there's a strong use case for knowing whether an option was short or not (so you can do custom logic for short options, and so you can give error messages which are customized based on whether the user passed an option as short or long)

With current token properties this is calculated by inspecting prefix of rawName. For example in limit-long-syntax.js:

token.rawName.startsWith('--')

Unfavoured approaches:

short (boolean): I dislike due to the type clash with short in options. (YMMV, @bakkot suggested they are different contexts.)
isShort (boolean): @bakkot rejected (as inconsistent with W3C guidelines)
short (char) or shortOption (char) only defined when used as short: making it conditional is subtle difference from short in options

New ideas of varying quality:

style: 'short' | 'long' or likewise usage: more symmetrical but less compact
shorted, shortened, shortform
prefix: '-' | '--'

shadowspawn commented 2 years ago

NB: I think shippable without changing any of the three things I just commented on, so feel free to not engage unless you loved a suggestion! I was just giving them one more explore.

token in input configuration: https://github.com/pkgjs/parseargs/pull/129#issuecomment-1146441974
rawName in token: https://github.com/pkgjs/parseargs/pull/129#issuecomment-1146551524
missing property for boolean short/long in token: https://github.com/pkgjs/parseargs/pull/129#issuecomment-1146560818

shadowspawn commented 2 years ago

First try at upstream documentation in README. A description of token properties, and an example call.

aaronccasanova commented 2 years ago

re: rawName in token: https://github.com/pkgjs/parseargs/pull/129#issuecomment-1146551524

For me, token.rawName feels concise and understandable sitting next to token.name. The one thing I think could be confusing is that an input of -abc would resolve to rawName: '-a' (for example). That being said, I have no concerns with that API decision as authors can easily access the raw arg with args[token.index] if needed.

In an effort to add some backing to the current rawName API, I referred back to Section 12.2 - Guideline 3 of the Utility Syntax Guidelines and the GNU Argument Syntax Conventions to see if long and short options had unique classifications. In both specs long and short options are referred to as option names to which rawName (resolving -f or --foo) seems well named and aligned with standards.

shadowspawn commented 2 years ago

This seems fleshed out enough to me to open an upstream PR on Node.js.

I suggest we keep this PR open and upstream the review from Node, what do you think?

A theoretical advantage in publishing here first is people trying it out. If any readers are lurking who would like @pkgjs/parseArgs published to consume and provide timely feedback, speak up now! 📣 😄

@shadowspawn are you comfortable opening the Node PR this time?

Urk. Ok.

Edit: it will be next week before I make a start on this.

bcoe commented 2 years ago

@shadowspawn may I merge this?

shadowspawn commented 2 years ago

Yes, it is up to date with upstream. (I'll get to it this weekend otherwise.)

pkgjs / parseargs