murmour / mparser

A simple monadic parser combinator library for OCaml
Other
64 stars 8 forks source link

Literal newlines in strings and characters do not update position #3

Open dsheets opened 8 years ago

dsheets commented 8 years ago

If you use something like string ":\n" in a parser, the line position is not updated if it matches.

In many cases, the string or character to consume is literal and so newline search can be done before the monad application while the parser is constructed. In the cases where the match is not literal, a string search is a small price to pay (imho) for correct position calculation.

Finally, is there a test suite for the library?

murmour commented 8 years ago

Hello, David.

If you use something like string ":\n" in a parser, the line position is not updated if it matches.

In many cases, the string or character to consume is literal and so newline search can be done before the monad application while the parser is constructed. In the cases where the match is not literal, a string search is a small price to pay (imho) for correct position calculation.

Your observation is correct.

A more simple, robust, and performant solution to the problem is to make position info a part of the stream, which is what I did in the unreleased version of the library (it still needs to be cleaned up and documented before the public release). That version also brings comprehensive support for Unicode and contains a very fast built-in regular expression engine. I plan to soon make a proper release of this reworked version, perhaps as a differently named package (mparser2?), since it breaks backward compatibility is certain ways.

Finally, is there a test suite for the library?

At this point, the library is only tested as part of an integrated testing suite for several proprietary applications which use MParser-implemented parsers, including relatively large ones: for Delphi, Java, and C#. For legal reasons, I can't open-source any of it right now, but an example of a large and complicated parser would surely help with testing and demonstrating the possibilities of the library, so I might as well contribute one when I have time for it (a parser for OCaml would be a great fit).

One more thing:

I became largely disillusioned with monadic combinator parsers, as they hardly scale with regards to performance and grammar complexity, and so the next version of the library is designed to be optionally usable as a backend for a compiled implementation, with parsers described as annotated PEGs.

murmour commented 7 years ago

A major update to MParser (that I mentioned above) still has some work to do before the release.

Writing this to feel shame.

...

(oh, the shame hurts; I really should find time to finish this)