Closed mitchellwrosen closed 9 years ago
Thanks for your input. That would indeed be useful. It would be much easier to create nice (e.g. clang-style) error messages if that information was available.
The Earley library doesn't know enough about the input token type to tell where new lines start and end, so it's not able to construct a Loc
on its own as it is. For that to work it would have to always work on Char
or L tokentype
(which is what you're doing). Though both seem useful, it is needlessly restrictive to commit to using only one of those.
Can we can find a way of doing it that does not sacrifice generality?
For what it's worth, your current approach seems good. It allows you to put location information in subexpressions, which you couldn't do with the proposed change to the library. You may also be able to get the information that you want by looking at the location of the first token in the unconsumed
field of the Report
(though the report is about how far it was able to parse before failing and not where individual parses were obtained).
I realized this morning that my original proposal makes no sense, as you point out (the parser has no source information). I think all I need are some simple combinators, which might not even belong in Earley
itself:
locSymbol :: Eq t => t -> Prod r e (L t) (L t)
locSymbol = symbol . L NoLoc
locSatisfy :: (t -> Bool) -> Prod r e (L t) (L t)
locSatisfy p = satisfy (p . unLoc)
This works because the Eq
instance of L
ignores its Loc
That looks like a nice approach. I'll close this issue now, but if you do come up with a better way to do it, we can revisit it.
Would it be possible to tag parsed items with a
Data.Loc.L
rather than anInt
? Having both the column and row number is very useful.More specifically,
would become
Currently, I'm lexing with https://hackage.haskell.org/package/lexer-applicative, which returns a stream of
L
-tagged lexemes anyway, and then I'm awkwardly unpacking and re-packing the locations into the parsed items. Here's a small example:(apologies if the above doesn't typecheck, I just wrote it by hand)
If
Earley
annotated parsed items withL
then I could simply forget all the work that the lexer did by mappingunLoc
over the stream of tokens, so I'd have productions likeProd r String Token Expr
rather thanProd r String (L Token) (L Expr)
. I wonder if there's a better way to integrate these two libraries.