Closed indiv0 closed 7 years ago
PR should now be ready for review.
The latest commits strip the PEG parser and any redundant test cases out, and also re-arrange the layout of the parser
module. The individual grammar rules are located in the parser::grammar
module, which is intended to be a private module.
In the parser::grammar
module there are currently two submodules, fetch
and sequence
.
These provide the grammar rules for parsing FETCH
commands and the sequence-set
rule from RFC3501. The reason these modules are separate is to ensure a logical separation between different groups of grammar rules (e.g. the FETCH
rules will never need to be used by other grammar rules, so they can be isolated), keeping only the "base" grammar rules (e.g. string and number parsing) in the parser::grammar
module itself.
The reason the grammar rules are located in the parser::grammar
submodule is so that the parser
module itself can provide a stable API to the grammar rules for when the module gets extracted into a separate crate. This will allow us to modify rules internally (or even change parser frameworks) without having to modify the public API. Currently this just consists of a wrapper function for the FETCH
parser, which doesn't even forward error cases correctly yet. This will be fixed in later revisions with a proper error type and additional APIs for other rules as they get added.
For future reference, prior to deleting the PEG parser I compared the performance of the PEG FETCH
command parser and its initial nom equivalent:
test parser::tests::bench_fetch_nom ... bench: 2,955 ns/iter (+/- 146)
test parser::tests::bench_fetch_peg ... bench: 10,549 ns/iter (+/- 181)
The nom parser completes in ~30% the time of the PEG parser.
I'm not sure that I understand all of the macro soup that is the nom parsing but most of it is straightforward enough that the hairier ones look like they should be fine.
I'm thinking that we should resolve the TODOs before merging. I've noted them to make it easy for you to find and for me to track in reviewable
A couple nits in command::fetch since the code there might end up being reverted back anyways.
Everything else looks good.
Reviewed 10 of 10 files at r2. Review status: all files reviewed at latest revision, 8 unresolved discussions.
src/command/fetch.rs, line 14 at r2 (raw file):
/// Take the rest of the arguments provided by the client and parse them into a /// Command object with command::fetch. pub fn fetch(args: Split<char>) -> Result<Command, ()> {
I take it ()
should be reversed back to parser::ParseError
?
src/command/fetch.rs, line 21 at r2 (raw file):
} parser::fetch(cmd.as_bytes())
Is there a reason to change the external API instead of just grabbing the bytestream inside of the fetch function?
src/parser/mod.rs, line 10 at r2 (raw file):
match self::grammar::fetch(input) { Done(_, v) => Ok(v), // TODO: handle the error and incomplete cases properly (possibly via a
Error handling TODO
src/parser/grammar/fetch.rs, line 180 at r2 (raw file):
); // TODO: confirm this only needs to work for ASCII
According to the RFC, it MUST be ASCII. I think the way it is right now is fine?
src/parser/grammar/fetch.rs, line 192 at r2 (raw file):
// a non-matching byte. This is useful for streaming parsers which may be // awaiting further input. // TODO: decide if this should be streaming-compatible. If not, add a
Streaming(?) TODO
src/parser/grammar/mod.rs, line 13 at r2 (raw file):
fn is_astring_char(chr: u8) -> bool { // TODO: perhaps take `char` instead, avoiding this cast?
char
/ u8
TODO
src/parser/grammar/sequence.rs, line 16 at r2 (raw file):
({ let mut seq = vec![a]; // TODO: implement this with iterator combinators instead.
idk what iterator combinators are but you left a TODO to resolve here.
src/parser/grammar/sequence.rs, line 69 at r2 (raw file):
assert_eq!(sequence_set(b"*"), Done(&b""[..], vec![Wildcard])); assert_eq!(sequence_set(b"1"), Done(&b""[..], vec![Number(1)])); // TODO: is this handled correctly by the parser?
Invalid sequences TODO
Comments from Reviewable
Review status: all files reviewed at latest revision, 8 unresolved discussions.
src/command/fetch.rs, line 21 at r2 (raw file):
Is there a reason to change the *external* API instead of just grabbing the bytestream inside of the fetch function?
I might be misunderstanding you here, but the external API (i.e., parser::fetch
) should accept a bytestream instead of a string because we have no guarantee that the FETCH command being parsed is UTF-8 compatible.
src/parser/grammar/fetch.rs, line 180 at r2 (raw file):
According to the RFC, it MUST be ASCII. I think the way it is right now is fine?
Alright. I'll remove this TODO.
src/parser/grammar/fetch.rs, line 192 at r2 (raw file):
Streaming(?) TODO
Since we've decided to not provide a streaming-compatible API for now, I'll just remove these TODOs (and all other streaming-related ones).
Comments from Reviewable
Reviewed 6 of 6 files at r3. Review status: 10 of 11 files reviewed at latest revision, all discussions resolved.
Comments from Reviewable
Reviewed 1 of 1 files at r4. Review status: all files reviewed at latest revision, all discussions resolved.
Comments from Reviewable
This PR replaces the existing PEG parser grammar with a parser written using the nom parser combinator framework. The PR builds upon #17, and should not be merged prior to #17.
In terms of performance, for the limited portion of the parser ported so far, nom appears to be somewhat
fasterslower (3-5%):Note that the nom parsing code has not been optimized at all, and may yield significant performance improvements upon optimization.
NOTE: this PR is currently a work-in-progress, and only part of the parser has been ported. DO NOT MERGE. EDIT: I had the benchmark function names mixed up. Updated.
This change is