Closed brianwilliams-candide closed 1 year ago
Sorry to be responding so late, because this project is not actively maintained as others, and I am the only one that is working on it.
I don't know if I fully understand the issue, but for me it is about how to define the syntax. I believe you could simply write
"{" NAME ":" LABEL "}"
as
"{" NAME ":" (NAME | LABEL ) "}"
and it should just works.
Since LABEL
token is a super set of NAME
token, by writing the syntax in this way, you don't need to worry about if the parser creates a token stream "{"
, NAME
, ":"
, NAME
, "}"
and makes the parser fail.
The code below is untested, but before starting a PR I thought it best to start a discussion:
Foreword
In the code example below I have added a
const
to store the match of the current substring instead of just testing it for a match. The reason for this is that it allows us to specify a capture group within the lexers token definition.Rationale
I am not sure if its the responsibility of the lexer/ tokeniser but there seems to be an issue with collisions. Take...
[true, /^[a-z]+/g, TokenKind.FieldName], [true, /^[a-zA-Z\s]+/g, TokenKind.FieldLabel],
... for the string
"{name:label}"
.We tried to implement a parse that had three tokens one for the
FieldName
andFieldLabel
and another for the semicolon (LabelSeparator
)Unless it's just naivety on our part that didn't work as before it gets to the parsing stage the lexer has already fallen over because of a regex collision i.e. unless I specifically make the label uppercase or contain a space it matches for both the
FieldName
andFieldLabel
and picks the first specified in the lexer.That forced us to specify the FieldLabel with a prefix of semicolon i.e.
[true, /^:[a-zA-Z\s]+/g, TokenKind.FieldLabel],
What that now means is that we have to manually strip the semicolon off at the parsing stage. I was wondering if adding support for the capture group syntax would mitigate it this problem.
[true, /^:([a-zA-Z\s]+)/g, TokenKind.FieldLabel],
Using this regex and (something similar to) the code below it would match on the whole regex but only capture the part we want (if specified)
If this is over-engineering of a problem that doesn't exist (which I have a suspicion it might be) by all means please let me know of the appropriate solution.
Code Example