Closed zbraniecki closed 3 years ago
This change would give us ~20% performance win:
Benchmarking parse/"simple": Collecting 100 samples in estimated 5.0453 s (2 parse/"simple" time: [17.874 us 18.020 us 18.193 us]
change: [-20.946% -19.166% -17.259%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
Benchmarking parse/"menubar": Collecting 100 samples in estimated 5.6131 s ( parse/"menubar" time: [155.88 us 157.36 us 159.22 us]
change: [-21.585% -19.878% -18.093%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
Gnuplot not found, disabling plotting
Using latest Rust nightly and the PR that I just put against master, the perf impact is as following:
parse/"simple": -15% parse/"preferences": -12.8% parse/"menubar": -15.3%
@cyplo - on twitter you indicated you might be able to help us with fuzzing [0]. I may file a separate issue for the parser fuzzing prior to 1.0, but this is a particular aspect of it that I'd like to tackle now.
We have a parser that in theory should only cut at known character bounds. In result, we should be able to slice for free using unchecked, and we know that it brings sizable perf wins, but I don't know to verify the risk here.
Is that something you may know how to help us decide on?
[0] https://twitter.com/zbraniecki/status/1096383281929560064
Hi, sure thing @zbraniecki ! Would it be possible to jump on a call to discuss this ? - you can DM me on Twitter
get_slice
should be unsafe. The start or end might not be at the start or end of a character, which makes the str invalid. While the parser probably doesn't currently violate that invariant, it could in the future. Making get_slice
unsafe makes it clearer to its callers that it has an invariant which must not be violated. This will reduce the likelihood for future changes to violate the invariant.
I honestly struggle to see any possibility of the future development of Fluent Syntax which would lead to us allowing for AST nodes to start/end on anything but start/end of characters.
Moving to 0.8, since it doesn't affect the API
This has been fixed with Slice
in AST.
@zbraniecki let me know if you're still interested in looking into fuzzing this together :) github@cyplo.net is a good starting point or @cyplo on pretty much anywhere
Thanks for removing the
unsafe
block here, at least for now. Would you mind filing a follow-up for discussing re-adding it, please? I'd like to look into it a bit more, given that we're likely protected from invalid UTF8 input by the sheer fact of acceptingString
intoFluentResource::try_new
.Originally posted by @stasm in https://github.com/projectfluent/fluent-rs/pull/76