Line/column information for productions

robrix / Madness

Recursive Descent Into Madness

MIT License

291 stars 17 forks source link

Line/column information for productions #61

Closed jspahrsummers closed 9 years ago

jspahrsummers commented 9 years ago

I have very little idea of what this means for the API, but it would be super handy to retain the line/column from which a given production was parsed.

This has applications for reporting parse errors, naturally, but I think this is even broader—for example, tools could use this information to pretty-print what they actually parsed, or instruct users on where a certain behavior is coming from.

robrix commented 9 years ago

Given #43, I think this’ll come down to returning the index in a CollectionType. That’ll give you enough to recover the line/column info (and we could provide conveniences for that), while still allowing other collections to be parsed naturally, e.g. trees would return a Zipper.

As to the API beyond error reporting, that’s a tricky one. I think we should try to write a couple of motivating examples.

robrix commented 9 years ago

cf #68, which is prerequisite to this.

robrix commented 9 years ago

With #70 in, I’ve been thinking about this a little further. I think it’s as simple as extending/augmenting the map operator (-->) such that the function receives the entire result of the parser:

(Input, Input.Index) -> (Tree, Input.Index)?
                         ^^^^^^^^^^^^^^^^^
                          THIS RIGHT HERE

Thus, AST can be constructed parameterized by the source index.

--> is the sole means through which we extract anything of note beyond the structure of the input as interpreted by the grammar’s structure, so it is the natural place to surface this; it’s also the final place at which that information is still locally visible.

robrix commented 9 years ago

If we change -->, we break all existing uses of it except e.g. x --> const(…).
If we overload -->, we break all x --> const(…), and we make the type inferencer’s job harder, and it’s already n² or even 2ⁿ in a few places AFAICS.
If we add another operator then there are two ways to map things, which is gross.