joewalker / barnardsstar

An experimental query parser for Datomish
Apache License 2.0
2 stars 0 forks source link

Consider using a more general parser tookit (nom?) instead of pest #1

Open ncalexan opened 7 years ago

ncalexan commented 7 years ago

Following the discussion in https://github.com/mozilla/datomish/pull/139, I expect we'll be parsing EDN and then layering on additional parsers (query parser, transaction parser, perhaps others). It would be nice if we could use the same parser generation toolkit for parsing strings and EDN token streams.

From http://dragostis.github.io/pest/pest/trait.Input.html, I see that pest's intention is to parse strings (or byte streams). (e.g., http://dragostis.github.io/pest/pest/trait.Input.html#tymethod.slice accepts a general Input but requires it to effectively be a string.)

I see that nom is more general. (Doubtless there are other, more general, PEG generators as well.) From https://github.com/Geal/nom#parser-combinators:

A parser in nom is a function which, for an input type I, an output type O, and an optional error type E, will have the following signature: fn parser(input: I) -> IResult<I, O, E>;

ncalexan commented 7 years ago

@rnewman @grigoryk worth considering as we build our libraries.

ncalexan commented 7 years ago

So, it appears that nom doesn't do a great job with this: see https://github.com/Geal/nom/issues/266, which is harder than our problem, but similar. I think nom could be bent to our purposes, but it's clear that bytes are the first-class thing and arbitrary tokens second-class.

Perhaps combine is a better option: https://docs.rs/crate/combine

ncalexan commented 7 years ago

Perhaps combine is a better option: https://docs.rs/crate/combine

Indeed there's a JSON parser in the combine test suite: https://github.com/Marwes/combine/blob/master/benches/json.rs. It wouldn't be too hard to turn that into an EDN parser.

A few more notes after playing with combine a bit: generally, folks migrate from nom to combine to get better error reporting. I can't speak to the differences yet.

There are some kinks when defining higher order parsers using combine; the obvious const p: ... = many(...) doesn't work. I think this is actually rooted in Rust's need to know the type and size of p statically, which doesn't play that nicely with combine in some cases. This may just be my Rust ignorance, though!