A modified version of FauxFaux's fix was chosen to detect both \n and \r, but not consume the line ending that terminates the comment
should error out if file_descriptor() leaves some text unparsed
FauxFaux's fix was chosen to avoid having file_descriptor(), a Result returning function, handle its error itself
fix message parsing (message())
nerdrew's fix was chosen since it takes advantage of the fact that message_event() already parses whitespace internally
add unit tests for parser.rs
both contributors' unit tests were included, with slight alterations to FauxFaux's to reflect the decision not to consume line endings while parsing single-line comments
Addressed by only FauxFaux's PR
enums with multiple lines of comments between items
enums with a trailing semicolon
Addressed by only nerdrew's PR:
whitespace parsing (br())
My own changes
Click to expand
### Fix: Reject missing names and types
Click to expand
#### Problem
`pb-rs` was able to compile files that had missing tokens without error:
```protobuf
message { // missing message name
= 0; // missing enum field name and type
oneof { // missing oneof name
int32 = 0; // missing oneof field name (type does throw error if missing)
}
}
enum { // missing enum name
= 0; // missing enum field name and type
}
```
It would signal success, and then generate a Rust file with those portions empty, causing syntax errors:
```rust
// ...
#[derive(Debug, PartialEq, Eq, Clone, Copy)]
pub enum {
= 0,
}
impl Default for {
fn default() -> Self {
::
}
}
// ...
```
#### Solution
In `pb-rs/src/parser.rs`, the `word` parser is responsible for detecting these tokens. It uses `many0` to detect **0** or more alphanumeric characters as a word.
However, it seems that in all situations where we use the `word` parser, we never want it to return empty-handed. Furthermore, if we do, I think we could use `opt(word)`.
Thus, just change `many0` to `many1` to force it to receive at least 1 alphanumeric character. This seems to fix missing:
- Message
- name
- field name
- field type
- Enum
- name
- field name
- Oneof
- name
- field name
Unit tests were written to check that these problematic situations now throw errors.
#### Small incidental fix
The preexisting unit test `test_nested_message` in the same file was actually missing a type (threw an error after the above fix), so I gave it one:
```diff
@@ -577,7 +594,7 @@ mod test {
message C {
}
message D {}
- optional b = 1;
+ optional int32 b = 1;
}"#;
```
### Fix: enforce variable naming rules more comprehensively
Click to expand
#### Problems
1. Qualified names are allowed where they shouldn't be
```
message some.package.Msg {} // wrong
```
2. Names with invalid starting characters or incorrectly placed dots
```
message 123Msg { // begins with number
optional int32 two..dot.name = 1; // contains two dots
}
```
#### Solutions
1. Made `word()` no longer parse qualified names and created a separate `qualifiable_name()` for those instances.
2. Tweaked `word()` and `qualifiable_name()` so as to follow [Rust variable naming rules](https://www.tutorialspoint.com/rust/rust_variables.htm#:~:text=on%20the%20variable.-,Rules%20for%20Naming%20a%20Variable,-In%20this%20section).
> Rust naming rules were chosen since I couldn't find any explicit protobuf naming rules. We can't just leave it because at the moment, names that aren't Rust-compatible will simply cause faulty generated code. For the same reason, this fix isn't a breaking change.
>
> The alternative is to put in an explicit modifier to modify generated variable names to be Rust-compatible, but I see no reason to do that.
Corresponding unit tests were written for both problems.
### Minor tests for comment parsing
Click to expand
- added `test_comments.proto` for testing via `run_test.sh`
- minor additions to existing unit tests in `pb-rs/src/parser.rs`
Summary
This MR merges changes from two PRs by public contributors, as well as some of my own fixes after them.
Original PRs:
Addressed by both PRs
Addressed by only FauxFaux's PR
Addressed by only nerdrew's PR:
br()
)My own changes
Click to expand
### Fix: Reject missing names and typesClick to expand
#### Problem `pb-rs` was able to compile files that had missing tokens without error: ```protobuf message { // missing message name = 0; // missing enum field name and type oneof { // missing oneof name int32 = 0; // missing oneof field name (type does throw error if missing) } } enum { // missing enum name = 0; // missing enum field name and type } ``` It would signal success, and then generate a Rust file with those portions empty, causing syntax errors: ```rust // ... #[derive(Debug, PartialEq, Eq, Clone, Copy)] pub enum { = 0, } impl Default for { fn default() -> Self { :: } } // ... ``` #### Solution In `pb-rs/src/parser.rs`, the `word` parser is responsible for detecting these tokens. It uses `many0` to detect **0** or more alphanumeric characters as a word. However, it seems that in all situations where we use the `word` parser, we never want it to return empty-handed. Furthermore, if we do, I think we could use `opt(word)`. Thus, just change `many0` to `many1` to force it to receive at least 1 alphanumeric character. This seems to fix missing: - Message - name - field name - field type - Enum - name - field name - Oneof - name - field name Unit tests were written to check that these problematic situations now throw errors. #### Small incidental fix The preexisting unit test `test_nested_message` in the same file was actually missing a type (threw an error after the above fix), so I gave it one: ```diff @@ -577,7 +594,7 @@ mod test { message C { } message D {} - optional b = 1; + optional int32 b = 1; }"#; ```Click to expand
#### Problems 1. Qualified names are allowed where they shouldn't be ``` message some.package.Msg {} // wrong ``` 2. Names with invalid starting characters or incorrectly placed dots ``` message 123Msg { // begins with number optional int32 two..dot.name = 1; // contains two dots } ``` #### Solutions 1. Made `word()` no longer parse qualified names and created a separate `qualifiable_name()` for those instances. 2. Tweaked `word()` and `qualifiable_name()` so as to follow [Rust variable naming rules](https://www.tutorialspoint.com/rust/rust_variables.htm#:~:text=on%20the%20variable.-,Rules%20for%20Naming%20a%20Variable,-In%20this%20section). > Rust naming rules were chosen since I couldn't find any explicit protobuf naming rules. We can't just leave it because at the moment, names that aren't Rust-compatible will simply cause faulty generated code. For the same reason, this fix isn't a breaking change. > > The alternative is to put in an explicit modifier to modify generated variable names to be Rust-compatible, but I see no reason to do that. Corresponding unit tests were written for both problems.Click to expand
- added `test_comments.proto` for testing via `run_test.sh` - minor additions to existing unit tests in `pb-rs/src/parser.rs`