naomijub / edn-rs

[DEPRECATED]: Crate to parse and emit EDN
https://crates.io/crates/edn-rs
MIT License
81 stars 14 forks source link

EdnError rework (new parse impl) #150

Closed Grinkers closed 3 months ago

Grinkers commented 7 months ago

Resolves https://github.com/edn-rs/edn-rs/issues/119

Performance

Note this is using the benchmark in main branch. The performance increase should be much better on larger strings, as there is now no clone() in parse. The only allocations are for constructing the EDN struct and for parsing numbers. Note that the walker is now all pointer based. Peeking was a big reason previously for the clones, you can see the new performance characteristics (utf-8 complexity but no allocations) https://godbolt.org/z/zK9dEzrYn

Before

     Running benches/parse.rs (target/release/deps/parse-5de3e714a9fe6fe3)
parse                   time:   [23.316 µs 24.270 µs 25.380 µs]
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe
     Running benches/tagged_parse.rs (target/release/deps/tagged_parse-041b63e457dffa70)
tagged_parse            time:   [4.4875 µs 4.5609 µs 4.6395 µs]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

After

     Running benches/parse.rs (target/release/deps/parse-5de3e714a9fe6fe3)
parse                   time:   [4.6219 µs 4.6649 µs 4.7142 µs]
                        change: [-81.813% -80.967% -80.093%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  8 (8.00%) high severe
       Running benches/tagged_parse.rs (target/release/deps/tagged_parse-041b63e457dffa70)
tagged_parse            time:   [1.6429 µs 1.6504 µs 1.6588 µs]
                        change: [-65.166% -64.416% -63.697%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

Debug

given

let edn = "#_ ,, #{hello, this will be discarded} #_{so will this} #{this is invalid";
let e = Edn::from_str(edn);
println!("{e:?}");

Before

Err(ParseEdn("None could not be parsed at char count 58"))

After

Err(EdnError { code: UnexpectedEOF, line: Some(1), column: Some(74), index: Some(73) }) Examples/tests can be seen in https://github.com/Grinkers/edn-rs/blob/parse/tests/error_messages.rs

Errors

In this rework, the following strings were also previously being parsed as valid EDN. Fuzzing invalid EDN is still a TODO (separate issue/PR).

":"
"5011227E71367421E12" ; https://github.com/edn-format/edn#symbols
"{ :[0x42] 42 }"
"\\cats"
"{ :foo 42 :foo 43 }"
Grinkers commented 6 months ago

https://github.com/edn-rs/edn-rs/pull/150/commits/3965bc3221d79b883a7059d7ce72c4c5c2eee366 I ended up adding a CustomError fallback for crates like edn-derive, std-only.

@evaporei Matching PR for compatibility over at (this is a breaking change). Please tell me if you have any better ideas. I believe derive will always rely on std, so it's easy enough to add std stuff without messing with the base layer https://github.com/edn-rs/edn-derive/pull/44

Grinkers commented 6 months ago

Added a new Error for duplicate in sets, matching clojure.

naomijub commented 6 months ago

Sorry for the delay, I'm moving to another city and have had limited time. I can take a look as soon as the branch is updated

codecov[bot] commented 6 months ago

Codecov Report

Attention: Patch coverage is 77.03927% with 76 lines in your changes are missing coverage. Please review.

Project coverage is 71.37%. Comparing base (c60d05b) to head (f658a21).

Files Patch % Lines
src/deserialize/parse.rs 81.59% 53 Missing :warning:
src/deserialize/mod.rs 40.00% 21 Missing :warning:
src/edn/error.rs 75.00% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #150 +/- ## ========================================== - Coverage 73.43% 71.37% -2.06% ========================================== Files 8 9 +1 Lines 862 877 +15 ========================================== - Hits 633 626 -7 - Misses 229 251 +22 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Grinkers commented 6 months ago

Sorry for the delay, I'm moving to another city and have had limited time. I can take a look as soon as the branch is updated

No worries, I don't think it's too urgent. It would be nice to be merged in so I can tackle the other issues though. Rebased, which is what I assume you meant. No other changes. Tell me if you want this broken up/squashed in a different way.

I think https://github.com/edn-rs/edn-rs/issues/138 should be a higher priority (github release action -> crates.io publishing). I think we should do a final 0.17 before this PR gets merged too.

Grinkers commented 3 months ago

Code coverage bot seems really bad? It's marking tons of things as untested.

To compare, here's the source based coverage generated by https://github.com/Grinkers/edn-rs/blob/5ad0427304dc9a05276ef2de36013c82e5ab22e8/bb.edn#L38

parse.rs.html.zip