tafia / quick-protobuf

A rust implementation of protobuf parser
MIT License
446 stars 82 forks source link

fix: multiple parser problems #227

Closed snproj closed 1 year ago

snproj commented 1 year ago

Summary

This MR merges changes from two PRs by public contributors, as well as some of my own fixes after them.

Original PRs:

Addressed by both PRs

These were merged manually.

  • fix single-line comment parsing
  • A modified version of FauxFaux's fix was chosen to detect both \n and \r, but not consume the line ending that terminates the comment
  • should error out if file_descriptor() leaves some text unparsed
  • FauxFaux's fix was chosen to avoid having file_descriptor(), a Result returning function, handle its error itself
  • fix message parsing (message())
  • nerdrew's fix was chosen since it takes advantage of the fact that message_event() already parses whitespace internally
  • add unit tests for parser.rs
  • both contributors' unit tests were included, with slight alterations to FauxFaux's to reflect the decision not to consume line endings while parsing single-line comments

Addressed by only FauxFaux's PR

Addressed by only nerdrew's PR:

My own changes

Click to expand ### Fix: Reject missing names and types
Click to expand #### Problem `pb-rs` was able to compile files that had missing tokens without error: ```protobuf message { // missing message name = 0; // missing enum field name and type oneof { // missing oneof name int32 = 0; // missing oneof field name (type does throw error if missing) } } enum { // missing enum name = 0; // missing enum field name and type } ``` It would signal success, and then generate a Rust file with those portions empty, causing syntax errors: ```rust // ... #[derive(Debug, PartialEq, Eq, Clone, Copy)] pub enum { = 0, } impl Default for { fn default() -> Self { :: } } // ... ``` #### Solution In `pb-rs/src/parser.rs`, the `word` parser is responsible for detecting these tokens. It uses `many0` to detect **0** or more alphanumeric characters as a word. However, it seems that in all situations where we use the `word` parser, we never want it to return empty-handed. Furthermore, if we do, I think we could use `opt(word)`. Thus, just change `many0` to `many1` to force it to receive at least 1 alphanumeric character. This seems to fix missing: - Message - name - field name - field type - Enum - name - field name - Oneof - name - field name Unit tests were written to check that these problematic situations now throw errors. #### Small incidental fix The preexisting unit test `test_nested_message` in the same file was actually missing a type (threw an error after the above fix), so I gave it one: ```diff @@ -577,7 +594,7 @@ mod test { message C { } message D {} - optional b = 1; + optional int32 b = 1; }"#; ```
### Fix: enforce variable naming rules more comprehensively
Click to expand #### Problems 1. Qualified names are allowed where they shouldn't be ``` message some.package.Msg {} // wrong ``` 2. Names with invalid starting characters or incorrectly placed dots ``` message 123Msg { // begins with number optional int32 two..dot.name = 1; // contains two dots } ``` #### Solutions 1. Made `word()` no longer parse qualified names and created a separate `qualifiable_name()` for those instances. 2. Tweaked `word()` and `qualifiable_name()` so as to follow [Rust variable naming rules](https://www.tutorialspoint.com/rust/rust_variables.htm#:~:text=on%20the%20variable.-,Rules%20for%20Naming%20a%20Variable,-In%20this%20section). > Rust naming rules were chosen since I couldn't find any explicit protobuf naming rules. We can't just leave it because at the moment, names that aren't Rust-compatible will simply cause faulty generated code. For the same reason, this fix isn't a breaking change. > > The alternative is to put in an explicit modifier to modify generated variable names to be Rust-compatible, but I see no reason to do that. Corresponding unit tests were written for both problems.
### Minor tests for comment parsing
Click to expand - added `test_comments.proto` for testing via `run_test.sh` - minor additions to existing unit tests in `pb-rs/src/parser.rs`