Custom provenance/metadata

mcdearman commented 2 months ago

I was wondering if there was a way to have custom provenance information or additional metadata on tokens that you can easily get access to. The spans I use for provenance are source offsets (and I'm parsing on a token stream after a lexing pass). I include these spans in the token stream but I don't have a nice way to get them out without altering every combinator. Before, when I was parsing on text, I had a basic wrapper using the builtin getOffset:

withSpan :: Parser a -> Parser (Spanned a)
withSpan p = do
  startPos <- getOffset
  result <- p
  Spanned result . SrcLoc startPos <$> getOffset

But now I can't do this because offsets are for tokens not source. I tried getting the parser state, but the issue is, even though I can get spans for tokens from the state, once I reach end of input I'm not sure how to get the span info from the token because the stream is empty.

mrkkrp commented 2 months ago

One solution could be to define a custom input stream and then define reachOffset and reachOffsetNoLine for it, so that after the last token has been parsed the source position is set to the end of that token. Once that is done, getSourcePos could be used similar to getOffset, but of course it will return the line and column, not offsets in the original input stream.

mcdearman commented 2 months ago

Would it be possible to parameterize the Parsec and ParsecT types over a metadata type that user could specify but by default would include the ordinary SourcePos?

mrkkrp commented 2 months ago

To be honest I am reluctant to index the ParsecT type with even more type variables.

mcdearman commented 2 months ago

Yea it is quite an important type to change. This might be something I could just do with a fork.

mrkkrp / megaparsec

Custom provenance/metadata #568