npm / node-semver

The semver parser for node (the one npm uses)
ISC License
5.1k stars 492 forks source link

[BUG] range.bnf incomplete regarding whitespace and "spermies" #392

Open gnattishness opened 3 years ago

gnattishness commented 3 years ago

What / Why

There are some discrepancies between what ranges are valid according to the BNF definition and the implementation. As far as I'm aware, this implementation (or the BNF) is the "canonical" definition for NPM version ranges.

Every range abiding by the BNF is valid according to the implementation, but the implementation also accepts ranges as valid that are not in the BNF.

Where

How

Current Behavior

Implementation allows for whitespace within a "simple" range expression at the start of a "partial", but this does not abide by the BNF grammar.

Steps to Reproduce

Using latest release:

const semver = require('semver')

console.log(semver.validRange('>= 1.2.3 < 1.6.0', {loose: false} )) // >=1.2.3 <1.6.0
console.log(semver.validRange('^ 1.2.3 || ^ 2.4.6', {loose: false})) // >=1.2.3 <1.6.0

And tests involving https://github.com/npm/node-semver/blob/e79ac3a450e8bb504e78b8159e3efc70895699b8/test/fixtures/range-parse.js#L28-L34 and https://github.com/npm/node-semver/blob/e79ac3a450e8bb504e78b8159e3efc70895699b8/test/fixtures/range-parse.js#L57-L60 pass, but are not valid according to the BNF.

In particular, there is no whitespace defined between comparison operators and the number: https://github.com/npm/node-semver/blob/e79ac3a450e8bb504e78b8159e3efc70895699b8/range.bnf#L6-L9 and the tilde definition does not allow for >: https://github.com/npm/node-semver/blob/e79ac3a450e8bb504e78b8159e3efc70895699b8/range.bnf#L10

Expected Behavior

Either:

  1. The BNF definition is expanded to allow whitespace between
  2. The implementation is restricted to match the BNF (e.g. only accept the ranges as valid when "loose"?)

    (this obviously has backwards compatibility issues)

  3. Document clearly that the implementation can parse and accept ranges outside of those defined in the BNF. (Though, perhaps will only ever return canonical ranges that comply with the BNF?)

If updating the grammar, the following change to be sufficient:

partial    ::= ( ' ' )* xr ( '.' xr ( '.' xr qualifier ? )? )?
...
tilde      ::= '~' ('>')? partial

Though the range definition could probably also be updated to reflect how multiple spaces are allowed:

range      ::= hyphen | simple ( ( ' ' )+ simple ) * | ''
gnattishness commented 3 years ago

How would you like to proceed? I'm happy to submit a PR to change the BNF if that's the way to go. I'm not particularly familiar with semver ranges, and am basing the issue on this repo alone. (I'm not aware of any other spec floating around that is the "true" definition for semver ranges)

gnattishness commented 2 years ago

Just bumping here.

This is relevant to other projects that make use of NPM-style version ranges or must parse package.json files. Currently, a parser implemented according to the range.bnf would fail to parse ranges that are accepted by node-semver (so it would error reading supposedly "valid" package.json files)

Shouldn't we want a parser that implements the range.bnf to be compatible with node-semver?

If one is to interpret the range.bnf as the specification for what node-semver sees as "valid" ranges, the specification is clearly incomplete/inaccurate. As it is, the bnf is only useful for those wanting to encode ranges such that node-semver can read them, not the other way around.

pombredanne commented 2 years ago

@gnattishness I guess one way to interpret this is that the grammar is not a spec but only a nice but incomplete documentation artifact and rather that only the actual implemented JS code here is the "spec"?

pombredanne commented 2 years ago

@isaacs you wrote this grammar, so what's your take on this?

gnattishness commented 2 years ago

@pombredanne Agreed, the code appears more a "reference implementation" that the BNF was made to reflect.

Though it could also be that (as in 3) the BNF defines a "canonical representation" that node-semver will output, rather than all the possibilities it would accept. In that case, I'd prefer that expected behavior be well defined and documented - perhaps with separate BNF grammars describing the values treated as valid, and their respective "canonical" forms.