In VOUnits syntax, should "1m" parse without error?

nxg commented 2 years ago

With the addition of the ‘dimensionless’ marker "1", the string 1m has become ambiguous, since the 1 is lexed as the dimensionless marker rather than a voufloat (which, recall, is either 0.[0-9]+([eE][+-]?[0-9]+)? or [1-9][0-9]*(\.[0-9]+)?([eE][+-]?[0-9]+)?).

Proposed change: either:

change the definition of the voufloat to be [0-9]+\.[0-9]+([eE][+-]?[0-9]+)? (ie, requiring a decimal point), or to this plus [02-9][0-9]*([eE][+-]?[0-9]+)? (ie, floats may omit a fractional part only if the integer part is not 1); or
regard this as a parsing bug and fix the lexer/grammar to handle this case, and insert rationale into the document explaining why the current pattern for voufloat is what it is.

msdemlei commented 2 years ago

On Sat, Dec 18, 2021 at 06:56:16AM -0800, Norman Gray wrote:

With the addition of the ‘dimensionless’ marker "1", the string 1m has become ambiguous, since the 1 is lexed as the dimensionless marker rather than a voufloat (which, recall, is either 0.[0-9]+([eE][+-]?[0-9]+)? or [1-9][0-9]*(\.[0-9]+)?([eE][+-]?[0-9]+)?).

Proposed change: either:

change the definition of the voufloat to be [0-9]+\.[0-9]+([eE][+-]?[0-9]+)? (ie, requiring a decimal point), or to this plus [02-9][0-9]*([eE][+-]?[0-9]+)? (ie, floats may omit a fractional part only if the integer part is not 1); or

I don't like this. Writing 1e-10m is rather natural; I'm personally pushing out quite a few units like this already, and if we break things in a minor update (which is, of course, bending rules already), we need to have a very strong reason. Which I think we don't have here.

regard this as a parsing bug and fix the lexer/grammar to handle this case, and insert rationale into the document explaining why the current pattern for voufloat is what it is.

I've not thought this alternative through, but my gut feeling isn't positive either.

But I'm having trouble understanding the problem in the first place. If you say:

empty_unit ::= 1

and say

input ::= empty_unit | complete_expression | scalefactor complete_expression

I'd say all is fine: 1 parses into empty_unit (and cannot be parsed in any other way), 1e-2 and its ilk won't parse at all, and 1m parses into scale_factor 1 and m into complete_expression. What am I missing?

nxg commented 2 years ago

Re 1e-10: true. I hadn't meant to exclude that, but I think it doesn't matter, because...

Re the grammar: the problem here is/was not so much in the grammar, as in the lexer, in that 1 is lexed as the dimensionless marker rather than a float. But that can in fact be easily fixed in the same way that we handle 10**3m and 10m:

scalefactor: LIT10 power numeric_power 
        | LIT10   // ie, "10"
        | LIT1   // ie, "1" <-- this is new
        | VOUFLOAT 
        ;

I've just tried that, and it works fine, so I propose that as the fix.

Re the rationale for the VOUFLOAT regexp: looking at it, and thinking way back, it's designed to forbid the case, meaningless in context, of 0, and the nearly malformed case of 1.; so nothing profound, and a line in the text saying that would be straightforward.

nxg commented 2 years ago

Closed in commit 7b967d7

ivoa-std / VOUnits

In VOUnits syntax, should "1m" parse without error? #8