Closed liluyue closed 6 months ago
lrlex is, intentionally, fairly simplistic. If you want to do a whitespace sensitive language (e.g. one that requires "start of line"), particularly one with some unusual rules such as markdown's, you'll want to write a hand-written lexer. The good news is that lrpar can happily work with a hand-written parser: see https://softdevteam.github.io/grmtools/master/book/manuallexer.html.
I haven't tried to see what causes it not to work, but I'm a little bit suprised because the default for RegexBuilder::multi_line within lrlex is true. But wanted to note that perhaps there are other regex options that affect the behavior.
It is worth noting that since 1.9.x regex has added RegexBuilder::crlf, so I'm curious if perhaps you are testing with crlf data? Perhaps we could add support for that option in CTLexerBuilder.
Anyhow besides a manual lexer, perhaps there are options to RegexBuilder
, which could changed to make this work?
I haven't tried to see what causes it not to work, but I'm a little bit suprised because the default for RegexBuilder::multi_line within lrlex is true. But wanted to note that perhaps there are other regex options that affect the behavior.
It is worth noting that since 1.9.x regex has added RegexBuilder::crlf, so I'm curious if perhaps you are testing with crlf data? Perhaps we could add support for that option in CTLexerBuilder.
Anyhow besides a manual lexer, perhaps there are options to
RegexBuilder
, which could changed to make this work?
because that String truncation causes loss of row information:
Ahh, indeed the \A
in your screenshot, and beginning to recall how the algorithm used in lrlex behaves it now makes sense why this doesn't just work already.
So indeed it seems like a manual lexer might be the only way to achieve this.
I am working on a markdown parser. Many of its tags, such as "#", match from the beginning of the line. ^ does not work in lrlex, and (? m) ^ also does not work