nodebox / seed

Procedural Content Generator
https://seed.emrg.be/
MIT License
22 stars 8 forks source link

Each error message should mention the right line number and position. #40

Open stebanos opened 6 years ago

stebanos commented 6 years ago

Right now only errors on the global structure of the source code communicate their line numbers. Expand this to the parsing and runtime level.

Also I noticed the current position numbers mentioned in errors are always off by at least 2 positions (due to the the first 2 characters of a line as the necessary indentation).

kunal-mohta commented 6 years ago

@stebanos I tried adding the line numbers in the phraseBook itself and now it looks something like this- screenshot 2018-02-27 13 39 53

lineNo corresponds the line number corresponding to the nodes root, giver, object and line is the line number of the further sub-nodes they have.

Can this be a start to the potential solution to this issue?

stebanos commented 6 years ago

Yes, it's a good start! But it's more consistent to call them either lineor lineNo.

kunal-mohta commented 6 years ago

Ohkay! I will continue working on these lines then.

kunal-mohta commented 6 years ago

@stebanos Well, I guess I didn't think this through properly. The classes Lexer and Parse do not yet have access to phraseBook. Instead they are required to make phraseBook. So storing the line numbers in phraseBook is a bad idea I guess.

kunal-mohta commented 6 years ago

Instead maybe I can modify parsePhrase() function so as to give the classes access to the lines?

stebanos commented 6 years ago

Not sure. The Parser might need access to the line number, the Lexer not really I think.

kunal-mohta commented 6 years ago

@stebanos You can have a look on #42. I provided Lexer with line numbers in it, though. If it seems unnecessary, I can pass it directly to Parser

kunal-mohta commented 6 years ago

@stebanos For fixing the additional two positions in error messages, can we use a parameter in the Lexer class to indicate that the line being parsed should start its position counter from 2 instead of 0 for lines starting with -?

stebanos commented 6 years ago

That won't be sufficient because a phrase can span multiple lines, and the Lexer isn't aware if it does.

kunal-mohta commented 6 years ago

I don't understand what you mean by a phrase spanning multiple lines. Could you please elaborate?

stebanos commented 6 years ago

Let me illustrate by an example:

root:
- <h1>Here's an example:</h1>
  <p>
    Dear {{ giver }p}, thank you for the {{ object }}.
  </p>
- Hey {{ giver }}, thank you for the {{ object }}.

root has two phrases. The first one spanning 4 lines from <h1> up until </p> but I made an error on the 4th line (which is the third line of our grouped phrase): {{ giver }p}. When I run the example I receive the following error:

Line 2: Invalid character } at position 46 in phrase '<h1>Here's an example:</h1><p> Dear {{ giver }p}, thank you for the {{ object }}.</p>'.

Not only is the reported line number wrong (it should be line 4, not line 2), but since it's all grouped the position gets wrong as well (it should be 19 and not 46).

kunal-mohta commented 6 years ago

Oh.. Okay. Now I get it. Considering this, may be we can add a key in the elements of the phrases array which stores the number of characters after which the lines break, as an array, which is then detected in the Lexer and the line numbers in the error messages are changed accordingly. For example, if we consider your illustration, then we can add this key to the object corresponding to root, stored by the phrases array

lineBreaks: [27, 3, 53, 4]; //number of characters after which the lines break in root

So the error message should be changed to have the line number as 46 - (27 + 3) = 16 because 46 > (27 + 3) && 46 < (27 + 3 + 53) Is this a viable solution?

stebanos commented 6 years ago

It could be, and it's one that crossed my mind before, but I wonder if there's a more elegant solution. I think it would be more beneficial if each parsed token becomes aware of it's line number and position within that line.

kunal-mohta commented 6 years ago

Yes, that would be a better solution as it would provide easy access to the line numbers, which could be helpful for future additional features. So if we are breaking down the phrase into parts depending on the line numbers, we would need a kind of a identifier that helps indicate that all of them belong to the same phrase.