rhaiscript / lsp

Language server for Rhai.
Apache License 2.0
43 stars 4 forks source link

Invalid parsing of range operator if it starts with a number #62

Closed schungx closed 2 years ago

schungx commented 2 years ago

Variables used in the range operators .. and ..= do not yield definition information, nor do they jump to definition when pressed F12.

For example, the following:

for n in 0..MAX { ... }

Hovering on MAX or pressing F12 does nothing.

I remember it used to work before...

schungx commented 2 years ago

Ah, the Rhai panel gives the answer: 0..MAX is parsed as 0. (float) . (period) MAX (identifier) instead of a range.

Maybe that is the reason why we're not getting definition.

Changing it to x..MAX, then both x and MAX have definitions.

schungx commented 2 years ago

A period . following a number is only parsed as a digit if it is not immediately followed by another period. Therefore, .. is always parsed as a range operator instead of a decimal point, with ..= again trumping that.

schungx commented 2 years ago

I believe the LIT_FLOAT regex needs to prevent taking a decimal point when it is ..

schungx commented 2 years ago

Just discovered one more issue with the float regex.

Based on user request, I allow a decimal without a following 0 for floating-point numbers. Meaning that 3. parses to 3.0.

However, a decimal point immediately followed by e is not allowed:

3.e2    // this parses as (3).e2, accessing the e2 property of integer 3
3.0e2    // this parses as 3.0 x 10^2

Right now, the regex still parses 3.e2 as 3.0e2 which is not proper.

Essentially, the decimal period must have a forbidden following of . or e.

schungx commented 2 years ago

This can be some form of:

[0-9][0-9_]*\.  // cannot be followed by another .
| [0-9][0-9_]*(\.[0-9][0-9_]*)?(e([+\-][0-9_]+|[0-9][0-9_]*))?

I don't think Rowan allows look-aheads, so I don't know how to prevent splitting ..

This may not be easily done via regex, so perhaps a custom parsing routine is needed.

tamasfe commented 2 years ago

Yeah looks like we have to hack around this manually, since logos has no or limited internal storage and does not support lookaheads, nor can we rewind it.

We need to store the tokens temporarily, and inspect each float token to see whether they end with a . and whether they are followed by another . and optionally a =, and reconstruct the tokens from there to resolve the ambiguity, sounds like a PITA.

Token priorities also won't help here since the 2. can be and has to be unambiguously parsed.

tamasfe commented 2 years ago

Actually nevermind, we can still do this at the tokenization level, by introducing something like a INTEGER_AND_RANGE that will always match the problematic part with something like [0-9][0-9_]*\.\.=? and then split it up manually.

tamasfe commented 2 years ago

Alright, these seem to have been fixed with 6e6568d2b276ce83d32c3e00e851a03d962b23a9 and a5134c02cc2552c8c268f8431c57d8f931c18e38.

I did what I wrote in the last comment, but unfortunately it was not possible in logos callbacks alone, so I introduced a proxy that optionally splits up these tokens somewhat transparently.

The range fix is fixed by matching [0-9][0-9_]*\.\.=? and splitting it up.

The float fix involves special-casing 123. in the float regex so that 123. is possible, and the rest can only appear valid if there are integers following the dot immediately.

And then another ambiguity appeared, 1.foo was parsed as float+integer. So I did the same as above and matched integer+dot+ident in a single token and split it up afterwards.

schungx commented 2 years ago

And then another ambiguity appeared, 1.foo was parsed as float+integer. So I did the same as above and matched integer+dot+ident in a single token and split it up afterwards.

I wonder why 1.foo can be parsed as a float + integer? Since foo is never an integer...

tamasfe commented 2 years ago

My bad, float+ident.