shnewto / bnf

Parse BNF grammar definitions
MIT License
256 stars 22 forks source link

add callbacks to string generation #87

Closed jdonszelmann closed 2 years ago

jdonszelmann commented 2 years ago

I was using bnf as a fuzzer for a parser I made. I experienced a problem that was hard to solve without a change to the library. I'm using my own fork right now, but I thought it would be nice to propose these changes for upstream as well. The problem I had is that I have a grammar that allows variable names in the form [a-zAZ_][a-zA-Z0-9_]*. However, not certain names (such as 'test', which is a keyword). I could just reject all strings with the word 'test' in, however, that would never generate strings with this keyword.

My solution is to add a set of alternatives to Grammar::generate() and Grammar::generate_seeded() which take a callback (though I'm not yet entirely sure about the current naming). For example Grammar::generate_callback().

Grammar::generate() is a thin wrapper around Grammar::generate_callback(|_, _| true), making the callback essentially a no-op. However, if a user callback is provided it can function as a filter. For example:

<01> ::= "0" | "1"
<09> ::= <01>| "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
<number> ::= <09> <number> | <09>
g.generate_callback(|ident, value| match ident {
    "number" => value.parse::<u32>().is_ok()
    _ => true
});

would only generate integers which fit into a u32.

If you don't like these changes, feel free to reject. Then I'll keep using my own fork.

shnewto commented 2 years ago

@jonay2000 Oh that's cool! I'm happy you decided to make a PR πŸ˜„

I've enabled CI for this PR and it looks like there's at least some formatting issues. If you're okay updating things so the CI passes that'd be great, otherwise there are no other blockers for me.

jdonszelmann commented 2 years ago

Will update later totday, glad you like it :)

jdonszelmann commented 2 years ago

I think I fixed it, but I thought I had formatted it previously as well :P. I think you need to approve the pipeline again

codecov[bot] commented 2 years ago

Codecov Report

Merging #87 (b61e75d) into main (b3ed7c3) will increase coverage by 1.07%. The diff coverage is 80.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #87      +/-   ##
==========================================
+ Coverage   90.48%   91.56%   +1.07%     
==========================================
  Files          10       10              
  Lines         788      794       +6     
==========================================
+ Hits          713      727      +14     
+ Misses         75       67       -8     
Impacted Files Coverage Ξ”
src/grammar.rs 90.09% <80.00%> (+2.17%) :arrow_up:
tests/grammar.rs 75.00% <0.00%> (-6.25%) :arrow_down:
src/production.rs 88.54% <0.00%> (+0.76%) :arrow_up:
src/expression.rs 95.41% <0.00%> (+0.91%) :arrow_up:
src/error.rs 74.57% <0.00%> (+1.69%) :arrow_up:
src/term.rs 91.78% <0.00%> (+2.89%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Ξ” = absolute <relative> (impact), ΓΈ = not affected, ? = missing data Powered by Codecov. Last update b3ed7c3...b61e75d. Read the comment docs.

jdonszelmann commented 2 years ago

I guess testing went down a bit on the patch due to the new functions, even though they're quite trivial. If you really mind I can add a test for them to just pass through them, but there's not much new code to test.

shnewto commented 2 years ago

Ah, I think the threshold for what causes that check to fail could use some tuning. A drop of less than 2% seems fine to me here. I'm good to merge but it'll be a few hours until I'm able to push a new version to crates.io. Thanks again!

jdonszelmann commented 2 years ago

Thanks for your quick response to the PR! As I said, glad you like it and happy I could contribute to an awesome project. If you don't mind I might improve on the system a bit more in the future to make the produces strings more uniform over the grammar. I read some papers about that recently. BNF makes for a great fuzzing library, I've used it for two projects already.

shnewto commented 2 years ago

Hah I didn't catch that the overall coverage actually went up because of some of what was removed πŸ˜„ Funny, yeah that patch coverage check might need to go πŸ€” 🀷

shnewto commented 2 years ago

@jonay2000 I'm really glad to hear it's working for you to fuzz πŸ™Œ πŸ˜„ and I'd definitely welcome more contributions from you πŸ’―

shnewto commented 2 years ago

@jonay2000 ah also, bnf = "0.3.4" with your changes is published to crates.

jdonszelmann commented 2 years ago

perfect!