microsoft / powerquery-parser

A parser for the Power Query / M formula language, written in TypeScript
MIT License
112 stars 26 forks source link

[BUG] Inconsistent and maybe buggy parsing of generalized-identifier compared to Power-BI #355

Open UliPlabst opened 1 year ago

UliPlabst commented 1 year ago

Hi, a user of powerqueryformatter.com filed this issue with me a couple of days.
He outlines that he cannot name the column of a table type with just a digit like that:

[...] type table [1 = text, 2 = text, 3 = text, 4 = text, 5 = text] 

the parser doesn't parse it saying Errors: Expected to find a identifier, but a numeric literal was found instead but in Power-BI it works. The issue is not pressing, as the digits can be escaped using quoted-identifier but I thought I'd let you know.
I digged in the language specification and the relevant rules are

table-type:
      "table" row-type
row-type:
      "[" field-specification-list? "]"
field-specification-list:
      field-specification
      field-specification "," field-specification-list
field-specification:
      optional? field-name field-type-specification?
field-type-specification:       //this branch is not relevant
      "=" field-type
field-name:
      generalized-identifier
      quoted-identifier
generalized-identifier:
      generalized-identifier-part
      generalized-identifier separated only by blanks (U+0020) generalized-identifier-part
generalized-identifier-part:
      generalized-identifier-segment
      decimal-digit-character generalized-identifier-segment
generalized-identifier-segment:
      keyword-or-identifier
      keyword-or-identifier dot-character keyword-or-identifier
keyword-or-identifier:
      letter-character
      underscore-character
      identifier-start-character identifier-part-characters
letter-character:
      A Unicode character of classes Lu, Ll, Lt, Lm, Lo, or Nl
identifier-start-character:
      letter-character
      underscore-character
decimal-digit-character:                  
      A Unicode character of the class Nd

to me it seems single digit identifiers are not according to spec. So either the spec is wrong or the Power-BI parser is wrong. Also when we look at generalized-identifier-part it seems that according to second branch in

generalized-identifier-part:
      generalized-identifier-segment
      decimal-digit-character generalized-identifier-segment

the identifier 1a should be valid, but it does not parse. If I understand the spec correctly this is a bug.

Expected behavior Consistency between language specification, microsoft/powerquery-parser and Power-BI.
Parsing of generalized-identifier according to spec

Actual behavior Parser Power-BI and language specification are inconsistent. 1b does not parse as generalized-identifier in a table type.

To Reproduce Please include the following:

bgribaudo commented 1 year ago

Unfortunately, I believe the grammar is incorrect in its definition of generalized identifiers (as in, it does not align with how the parser used by Power BI/Excel works). See https://github.com/MicrosoftDocs/powerquery-docs/issues/30.

This project uses modified rules for generalized identifiers, but, apparently, they don't match exactly with how Power BI/Excel work either. :-(

It would be really neat if the official grammar's definition could be corrected. :-)