quantified-uncertainty / squiggle

An estimation language
https://squiggle-language.com
MIT License
151 stars 23 forks source link

Missing commas points to a strange location when failing to parse #1343

Open Hazelfire opened 1 year ago

Hazelfire commented 1 year ago

Description:

dict = {
    test: 5,
    test2: 6,
    test5: 7
    test9: 4
}

It creates a strange error, pointing to the first colon (on develop, this gives line 2): image https://www.squiggle-language.com/playground/#code=eNqrVirOyC8PLs3NTSyqVLIqKSpN1QELuaZkluQXwUQy8zJLMhNzggtLM9PTc1KDS4oy89KVrJRSMpNLFGwVqmPyFICgJLW4xErBVAfBM7JSMEPimlopmCN4llYKJjF5tUq1AB%2BVK0k%3D

Expected behavior:

Expect "Missing comma", or at the very least, pointing towards an identifier that's close to the missing comma.

Syntax Error: Expected "(", "->", "?", "|>", "}", assignment, operator, whitespace, or whitespace or newline but ":" found.

I've created an error messages tag. These issues should be fairly easy to fix once we've done the typescript migration.

berekuk commented 1 year ago

Yeah, our parser's error messages are pretty bad right now in general.

It's not related to Rescript/Typescript, btw, Squiggle's parser is implemented with Peggy, which is compiled to JS and then used from Rescript.

I don't really know much about parsers, but I've heard that parser generators are typically pretty bad at error messages, and maybe we should roll our own.

mlochbaum commented 1 year ago

I have no idea why Peggy would end up with that error location, but as long as the newline after 7 there couldn't be part of valid syntax, a basic recursive descent parser should just point to the end of line there.

umuro commented 1 year ago

A basic recursive descent parser is impossible for Squiggle syntax.

Having to use backtracking complicates the error messages. The unit syntax is a source of ambiguity.

In this case probably, the parser thinks test9 might be unit so "7 test9" is value, it is not aware that the record value should have ended. Because "7;" expectation has already failed and it has found the beginning of a longer grammar pattern.

The problem with backtracking is that previously tried but failed grammar options are already discarded. Therefore the parser error comes from the last tried grammar. If the grammar was simply recursive then error messages would be perfect, of course.

If possible, sorting the backtracked varieties so that the most possible grammar option is the last one solves this issue.

umuro commented 1 year ago

My previous interpretation was wrong.

This is because of "{" ambiguity. The record grammar has failed long ago, it has returned from so many grammar rules, and Peggy is trying to parse a correct block starting with "{". Thus, it reporting how to correct the block syntax.

But why does not it also report the record problem along with it?

The problem here is that "{" could be both a block start and a record start. And those 2 structures are not even in the same grammar rule (different priorities). Correcting this situation requires all '{' prefixed rules to have the same priority. Then they would be alternatives inside the same grammar rule and report correctly. However, the priorities of records and blocks are so different. They are so far away.

Thus, once a correct key value pair is identified, Peggy should not backtrack and try block grammar on further errors. "Hey, you have seen a key value pair, insist that this is a record, do not backtrack and do not try other alternatives". I guess there might be a way to block backtracking.

Whoever implementing the same grammar manually would be in the same situation because of "{" ambiguity. The idea that "writing recursive descent would not cause this problem" does not apply here. For an ambiguous grammar, one cannot even write a recursive descent implementation. "An unambiguous grammar would not cause such problems"