projectfluent / fluent-rs

Rust implementation of Project Fluent
https://projectfluent.org
Apache License 2.0
1.05k stars 95 forks source link

Switch the AST over to Cow<str> #183

Closed Michael-F-Bryan closed 3 years ago

Michael-F-Bryan commented 4 years ago

Fixes #104

Michael-F-Bryan commented 4 years ago

Benchmark comparing 25dde75 against master, c9e4565 (i.e. replace &'ast str with Cow<'ast, str>):

$ cargo bench --bench parser -- --baseline base
   Compiling fluent-syntax v0.9.3 (C:\Projects\fluent-rs\fluent-syntax)
    Finished bench [optimized] target(s) in 9.46s
     Running C:\Projects\fluent-rs\target\release\deps\parser-99e6937ba38adfc9.exe
Gnuplot not found, using plotters backend
parse/"simple"          time:   [43.395 us 45.529 us 48.144 us]
                        change: [+25.007% +29.004% +33.622%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
parse/"preferences"     time:   [387.97 us 390.36 us 393.46 us]
                        change: [+3.4989% +6.0201% +8.2245%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
parse/"menubar"         time:   [104.65 us 108.02 us 112.16 us]
                        change: [+6.1007% +8.5359% +11.820%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

unicode                 time:   [7.7112 us 8.0368 us 8.4848 us]
                        change: [+1.5234% +4.8896% +8.6534%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe

parse_ctx/"browser"     time:   [399.25 us 405.31 us 412.26 us]
                        change: [+5.0663% +6.1496% +7.3562%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
Benchmarking parse_ctx/"preferences": Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.1s or reduce sample count to 70.
parse_ctx/"preferences" time:   [989.68 us 1.0161 ms 1.0505 ms]
                        change: [+8.0732% +11.960% +16.002%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

Other than the "simple" benchmark, we're looking at a 6-11% performance regression.

zbraniecki commented 4 years ago

I'll measure against Fx, but 6-11% would likely translate to ~1.5-2ms on startup on a very fast machine - https://bugzilla.mozilla.org/show_bug.cgi?id=1560038#c57

On a slower user machine this may be 5-10ms. I'd prefer to maintain a ready-only AST over paying that price for convenience.

Michael-F-Bryan commented 4 years ago

Benchmark comparing 59fb6ad against master (c9e4565) after updating fluent-bundle to use Cow<str>.

$ cargo bench --bench resolver -- --baseline base
   Compiling fluent-syntax v0.9.3 (C:\Projects\fluent-rs\fluent-syntax)
   Compiling fluent-bundle v0.11.0 (C:\Projects\fluent-rs\fluent-bundle)
    Finished bench [optimized] target(s) in 11.11s
     Running C:\Projects\fluent-rs\target\release\deps\resolver-d65a2ee18bbb4c2c.exe
Gnuplot not found, using plotters backend
resolve/"simple"        time:   [9.5628 us 9.6640 us 9.7876 us]
                        change: [-14.508% -10.849% -7.4316%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
resolve/"preferences"   time:   [140.55 us 142.14 us 144.46 us]
                        change: [+2.1204% +5.5609% +8.9761%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
resolve/"menubar"       time:   [31.260 us 31.382 us 31.524 us]
                        change: [-2.0665% -0.6816% +0.5605%] (p = 0.33 > 0.05)
                        No change in performance detected.
resolve/"unescape"      time:   [1.7500 us 1.7625 us 1.7773 us]
                        change: [-2.0425% -0.9417% -0.0567%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

It looks like the "preferences" benchmark took a 5% hit, so switching to Cow<str> also has a negative effect when formatting a message.

I'd prefer to maintain a ready-only AST over paying that price for convenience.

@zbraniecki, I agree. I was thinking we could use Yandros's trick of parameterising the AST nodes over some type S: AsRef<str> and giving it a default value of S = &'ast str. That way the base case will be identical to the current AST, but if you need to do mutations you can transform it to using Cow<str> when folding over the tree... Thoughts?

zbraniecki commented 4 years ago

I'm open to it, yeah!

Michael-F-Bryan commented 4 years ago

Soo.... I'm seeing something strange here. In 5e26a42 I've made the entire AST generic over its string type (i.e. &'ast str -> S) and the parser returns a concrete ast::Resource<&'ast str>. In theory this is exactly identical to the previous AST at the binary level (we replaced &'ast str with a generic parameter, then instantiated that generic with &'ast str).

However, now parse/simple regressed by 6%, parse/menubar improved by 4.9%, and parse/preferences improved by 7.7%. The parse_ctx benchmarks aren't showing any significant change.

$ cargo bench --bench parser -- --baseline base
   Compiling fluent-syntax v0.9.3 (C:\Projects\fluent-rs\fluent-syntax)
    Finished bench [optimized] target(s) in 10.47s
     Running C:\Projects\fluent-rs\target\release\deps\parser-99e6937ba38adfc9.exe
Gnuplot not found, using plotters backend
parse/"simple"          time:   [35.461 us 35.716 us 36.088 us]
                        change: [+4.4435% +5.8972% +7.6256%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
parse/"preferences"     time:   [337.48 us 342.14 us 348.18 us]
                        change: [-9.8723% -7.7005% -5.7328%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
parse/"menubar"         time:   [91.829 us 92.043 us 92.279 us]
                        change: [-5.8071% -4.8987% -4.0698%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  3 (3.00%) high severe

unicode                 time:   [7.8541 us 8.1120 us 8.4140 us]
                        change: [+0.2610% +4.5582% +11.198%] (p = 0.08 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe

parse_ctx/"browser"     time:   [386.82 us 394.27 us 403.12 us]
                        change: [+0.1738% +1.7793% +3.6869%] (p = 0.05 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe
parse_ctx/"preferences" time:   [886.97 us 899.27 us 913.82 us]
                        change: [-4.2761% -1.4430% +0.9149%] (p = 0.31 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

I've got two hypotheses for these results:

  1. By making our AST generic it now gets instantiated at a different time and LLVM optimises the parsing code differently, or
  2. The PC I'm running these benchmarks on is inherently noisy, and most of the performance differences we're seeing aren't due to code changes

I'm leaning towards the latter because this is just a commodity PC. The easiest way to check is to try on my laptop and see whether the benchmark results change at all.


@zbraniecki, I've reverted the Cow<str> commits instead of force pushing them out of existence so we have something to reference when comparing benchmark results. Once we're satisfied that this PR doesn't regress too much I'll be able to squash the lot into a single change.

Michael-F-Bryan commented 4 years ago

I'm doing the benchmarks again on my personal laptop.

Looking at those results, it seems like Cow<str> (25dde75) adds about 4% on average and the generic version (5e26a42) improves performance by about 20-25%... I'm no expert on performance tuning, but it feels almost too good to be true that the generic version is 20% faster than the original.

$ rustc --version --verbose
rustc 1.46.0-nightly (a8cf39911 2020-06-21)
binary: rustc
commit-hash: a8cf3991177f30694200002cd9479ffbbe6d9a1a
commit-date: 2020-06-21
host: x86_64-unknown-linux-gnu
release: 1.46.0-nightly
LLVM version: 10.0
$  uname -a
Linux archiso 5.7.4-arch1-1 #1 SMP PREEMPT Thu, 18 Jun 2020 16:01:07 +0000 x86_64 GNU/Linux
$ nprocs
8

# Using master as the baseline

$ git checkout c9e4565
$ cargo bench --bench parser -- --save-baseline original                                                                 
    Finished bench [optimized] target(s) in 0.05s
     Running /home/michael/Documents/forks/fluent-rs/target/release/deps/parser-4a42273044a13d94
parse/"simple"          time:   [12.725 us 12.779 us 12.846 us]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
parse/"preferences"     time:   [208.46 us 234.08 us 269.13 us]
parse/"menubar"         time:   [57.577 us 57.784 us 58.006 us]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild

unicode                 time:   [4.4069 us 4.4364 us 4.4801 us]
Found 27 outliers among 100 measurements (27.00%)
  22 (22.00%) low severe
  3 (3.00%) high mild
  2 (2.00%) high severe

parse_ctx/"browser"     time:   [235.04 us 236.73 us 238.86 us]
Found 20 outliers among 100 measurements (20.00%)
  1 (1.00%) high mild
  19 (19.00%) high severe
parse_ctx/"preferences" time:   [581.11 us 582.82 us 585.39 us]
Found 14 outliers among 100 measurements (14.00%)
  8 (8.00%) low severe
  1 (1.00%) high mild
  5 (5.00%) high severe

cargo bench --bench parser -- --save-baseline original  106.76s user 0.74s system 163% cpu 1:05.85 total

# Compare Cow<str> to original

$ git checkout 25dde75
$ cargo bench --bench parser -- --baseline original                                                                     
    Finished bench [optimized] target(s) in 0.03s
     Running /home/michael/Documents/forks/fluent-rs/target/release/deps/parser-4a42273044a13d94
parse/"simple"          time:   [15.871 us 16.578 us 17.225 us]
                        change: [+13.777% +16.698% +19.557%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 20 outliers among 100 measurements (20.00%)
  2 (2.00%) high mild
  18 (18.00%) high severe
parse/"preferences"     time:   [237.04 us 238.11 us 239.19 us]
                        change: [-73.657% -67.879% -59.132%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
parse/"menubar"         time:   [58.760 us 59.268 us 59.942 us]
                        change: [+2.0590% +2.6362% +3.3442%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 20 outliers among 100 measurements (20.00%)
  9 (9.00%) low severe
  4 (4.00%) high mild
  7 (7.00%) high severe

unicode                 time:   [4.5704 us 4.6122 us 4.6706 us]
                        change: [+2.9406% +3.8877% +5.0021%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 21 outliers among 100 measurements (21.00%)
  5 (5.00%) high mild
  16 (16.00%) high severe

parse_ctx/"browser"     time:   [244.86 us 245.86 us 246.88 us]
                        change: [+3.6058% +4.3485% +4.9392%] (p = 0.00 < 0.05)
                        Performance has regressed.
parse_ctx/"preferences" time:   [607.01 us 609.46 us 611.88 us]
                        change: [+3.8941% +4.3636% +4.7830%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

cargo bench --bench parser -- --baseline original  152.89s user 1.35s system 218% cpu 1:10.75 total

# Compare generic to original

$ git checkout 5e26a42
$ cargo bench --bench parser -- --baseline original                                                                 
   Compiling fluent-syntax v0.9.3 (/home/michael/Documents/forks/fluent-rs/fluent-syntax)
    Finished bench [optimized] target(s) in 2.50s
     Running /home/michael/Documents/forks/fluent-rs/target/release/deps/parser-4a42273044a13d94
parse/"simple"          time:   [12.790 us 12.872 us 12.979 us]
                        change: [-1.8642% -0.6152% +0.9808%] (p = 0.38 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
parse/"preferences"     time:   [174.77 us 175.35 us 176.09 us]
                        change: [-80.612% -76.301% -69.792%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
parse/"menubar"         time:   [43.262 us 43.362 us 43.469 us]
                        change: [-24.915% -24.614% -24.307%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

unicode                 time:   [3.8113 us 3.9722 us 4.1926 us]
                        change: [+92.864% +162.38% +248.04%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 24 outliers among 100 measurements (24.00%)
  1 (1.00%) high mild
  23 (23.00%) high severe

parse_ctx/"browser"     time:   [183.62 us 184.07 us 184.56 us]
                        change: [-22.442% -21.875% -21.433%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
Benchmarking parse_ctx/"preferences": Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.1s or reduce sample count to 50.
parse_ctx/"preferences" time:   [455.44 us 457.74 us 461.16 us]
                        change: [-21.046% -20.457% -19.777%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

cargo bench --bench parser -- --baseline original  125.23s user 1.00s system 220% cpu 57.349 total
zbraniecki commented 4 years ago

but it feels almost too good to be true that the generic version is 20% faster than the original.

Hah, yeah, it seems a bit like a random optimization that may be overexpressed in a microbenchmark. I'll test that against real world code, but the approach is very promising.

This way we could have AST<&str> and if you need to edit it, you can copy it into AST<String> and, work with, and then serialize.

zbraniecki commented 4 years ago

This looks good!

I like the use of generics here over macros.

What I'd like to get to is handling of FluentResource more like Cow where it is either the rental::FluentResource or ast::Resource<String>.

Maybe something like this:

#[derive(Debug)]
pub enum FluentResource {
    Owned(ast::Resource<String>),
    String(rentals::FluentResource)
}

impl FluentResource {
    pub fn try_new_slice(source: String) -> Result<Self, (Self, Vec<ParserError>)> {
        let mut errors = None;
        let res = rentals::FluentResource::new(source, |s| match parse(s) {
            Ok(ast) => ast,
            Err((ast, err)) => {
                errors = Some(err);
                ast
            }
        });

        let res = FluentResource::String(res);

        if let Some(errors) = errors {
            Err((res, errors))
        } else {
            Ok(res)
        }
    }

    pub fn ast<S>(&self) -> &ast::Resource<S>
        where S: Borrow<str> {
        match self {
            Self::Owned(res) => res,
            Self::String(res) => res.all().ast
        }
    }
}

and then we could make the resolver handle either because it only needs the read view into either variant.

What do you think?

Michael-F-Bryan commented 4 years ago

What do you think?

I like the idea, but I'm not sure if it'll pan out... By having resources use either owned or borrowed ASTs you can't let the user access them via a single method because they are two different types (even using something like impl Trait, because impl Trait is syntactic sugar for returning a single unnamed type).

You could maybe add an extra level of indirection via the visitor pattern. The idea being users need to pass in something which satisfies Visitor<S> where S: AsRef<str> then we write a function which recursively walks the tree invoking visitor methods for either the owned or borrowed trees.

trait Visitor<S> {
  fn visit_resource&mut self, res: &ast::Resource<S>) { /* default recursion */ }  
  fn visit_inline_expression(&mut self, expr: &ast::InlineExpression<S>) { /* default recursion */ }
  ...
}

impl FluentResource {
  fn visit_ast<V>(&self, visitor: &mut V) 
    where V: Visitor<String> + for<'ast> Visitor<&'ast str>,
  {
    match self {
      FluentResource::Owned(owned) => visitor.visit_resource(owned),
      FluentResource::Rental(rental) => visitor.visit_resource(rental),
    }
  }
}

You lose the ability to use nice things like pattern matching though, so it'd work for evaluating a message but could make other operations a pain. Performance wouldn't change because we're using compile-time generics and you'd do this recursive traversal when evaluating messages anyway.

The alternative would be to use Resource<Cow<str>> and rental all the time, but then we're back to the performance issues from before with the added problem that you need to traverse the AST twice (once when parsing to Resource<&'ast str> and another to convert &'ast str to Cow<'ast, str>).

zbraniecki commented 4 years ago

You lose the ability to use nice things like pattern matching though, so it'd work for evaluating a message but could make other operations a pain.

What other operations do you think of? I'm getting warmed up to the idea that maybe a Visitor model would work?

The resolver is a pretty black-box solution, and if I'd end up with it being a generic that can handle Resource<String> and Resource<&str> then I'm totally fine with that.

I'm also looking again at the StrView idea.

Seems like both approaches on paper should give us no performance hit.

Do you have an opinion on which one would lead to more managable code?

Michael-F-Bryan commented 4 years ago

What other operations do you think of? I'm getting warmed up to the idea that maybe a Visitor model would work?

I'm not sure. I guess most tools (my primary area of interest) will be using the AST layer directly and not touching the FluentResource type, so it won't affect them. Other than that the only real consumer of a FluentResource is the resolver and as long as it can format a message, nobody cares about its internals.

I'm also looking again at the StrView idea.

Are you referring to a type which will take slices into a reference-counted string on the heap? I put together a proof-of-concept version which uses the bytes crate for all the heavy lifting.

Proof of concept ArcStr type. ```rust use bytes::Bytes; use std::fmt::{self, Display, Formatter}; use std::ops::{Bound, Deref, Index, RangeBounds}; use std::slice::SliceIndex; use std::str::Utf8Error; /// A slice into a reference-counted string. #[derive(Debug, Clone, PartialEq, Eq, Hash, Ord, PartialOrd)] pub struct ArcStr(Bytes); impl ArcStr { pub const fn new() -> Self { ArcStr(Bytes::new()) } pub const fn from_static(text: &'static str) -> Self { ArcStr(Bytes::from_static(text.as_bytes())) } pub fn from_utf8(bytes: &[u8]) -> Result { // use the standard library to make sure it's valid UTF-8 let _ = std::str::from_utf8(bytes)?; // Safety: We just did the validity check. unsafe { Ok(ArcStr::from_utf8_unchecked(bytes)) } } /// Create a new [`ArcStr`] from raw bytes *without* checking that the /// string contains valid UTF-8. pub unsafe fn from_utf8_unchecked(bytes: &[u8]) -> Self { ArcStr(Bytes::copy_from_slice(bytes)) } pub fn len(&self) -> usize { self.0.len() } pub fn is_empty(&self) -> bool { self.0.is_empty() } pub fn copy_from_slice(data: &str) -> Self { ArcStr(Bytes::copy_from_slice(data.as_bytes())) } /// Return a slice of `self` for the provided range. /// /// This will increment the reference count for the underlying memory and /// return a new [`ArcStr`] handle pointing at the slice. pub fn slice(&self, range: R) -> Option where R: RangeBounds, { let len = self.len(); let begin = match range.start_bound() { Bound::Included(&n) => n, Bound::Excluded(&n) => n + 1, Bound::Unbounded => 0, }; let end = match range.end_bound() { Bound::Included(&n) => n + 1, Bound::Excluded(&n) => n, Bound::Unbounded => len, }; if !self.as_str().is_char_boundary(begin) || !self.as_str().is_char_boundary(end) { return None; } // Safety: We've just done our bounds checks. Some(ArcStr(self.0.slice(range))) } pub fn as_str(&self) -> &str { &*self } } impl Deref for ArcStr { type Target = str; fn deref(&self) -> &str { // Safety: You can only create an ArcStr from either a valid UTF-8 // string or via the ArcStr::get() method which does the correct bounds // checks. unsafe { std::str::from_utf8_unchecked(&self.0) } } } impl AsRef for ArcStr { fn as_ref(&self) -> &str { &*self } } impl PartialEq for ArcStr where str: PartialEq, { fn eq(&self, other: &S) -> bool { self.as_ref().eq(other) } } impl<'a> PartialEq for &'a str { fn eq(&self, other: &ArcStr) -> bool { self.as_bytes() == other.as_bytes() } } impl PartialEq for ArcStr { fn eq(&self, other: &str) -> bool { self.as_bytes() == other.as_bytes() } } impl Index for ArcStr where I: SliceIndex, { type Output = I::Output; fn index(&self, index: I) -> &Self::Output { match self.as_ref().get(index) { Some(s) => s, None => panic!("Attempted to slice out of bounds"), } } } impl Display for ArcStr { fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result { Display::fmt(self.as_str(), f) } } #[cfg(test)] mod tests { use super::*; #[test] fn deref_impl_works() { let s = ArcStr::from_static("Hello, World!"); let got: &str = &*s; assert_eq!(got, "Hello, World!"); } #[test] fn index_into_the_str_slice() { let s = ArcStr::from_static("Hello, World!"); let got = &s[..5]; assert_eq!(got, "Hello"); } #[test] fn create_a_sub_slice_into_the_string() { let text = ArcStr::from_static("💩🦀"); let length = '💩'.len_utf8(); // try indices immediately before/after a boundary assert_eq!(text.slice(..length).unwrap().as_str(), "💩"); assert!(text.slice(..=length).is_none()); assert!(text.slice(..1).is_none()); // Note the deliberate out-of-bounds let max_index = text.as_str().len() + 1; // try every possible index and every possible range combination for start in 0..=max_index { for end in start..=max_index { assert_range_equal(&text, text.as_str(), start..); assert_range_equal(&text, text.as_str(), start..end); assert_range_equal(&text, text.as_str(), start..=end); assert_range_equal(&text, text.as_str(), ..=end); assert_range_equal(&text, text.as_str(), ..end); } } } fn assert_range_equal(left: &ArcStr, right: &str, range: R) where R: RangeBounds + SliceIndex + Clone, { let got = left.slice(range.clone()); let should_be = right.get(range); assert_eq!(got.as_deref(), should_be); // make sure this is valid UTF-8 if let Some(ArcStr(bytes)) = got { assert!(std::str::from_utf8(&bytes).is_ok()); } } } ```

I'm going to play around with it in another branch and see how it affects performance. The only part that'll change performance is bumping the reference count every time we slice into the string.

Michael-F-Bryan commented 4 years ago

Here are the benchmark results for #187 (using an ArcStr instead of &str or Cow<str>) and you can see performance regressed by a non-trivial amount.

I thought it might have been due to the way the benchmarks are constructed (it stores the text as Strings, so every run of every benchmark ended up copying the contents to a new ArcStr), but switching everything to ArcStr in Michael-F-Bryan/fluent-rs@15d0d8527a404c419aa5c096ae5428ee424eff41 to skip the initial copy didn't help too much...

# Baseline (master)

$ git checkout master && git rev-parse HEAD
c9e456515f4c084eb9650f3910722a31707a6b32
$ cargo bench --bench parser
parse/"simple"          time:   [12.823 us 12.903 us 13.024 us]
Found 14 outliers among 100 measurements (14.00%)
  9 (9.00%) high mild
  5 (5.00%) high severe
parse/"preferences"     time:   [236.38 us 238.11 us 240.29 us]
Found 15 outliers among 100 measurements (15.00%)
  5 (5.00%) low mild
  7 (7.00%) high mild
  3 (3.00%) high severe
parse/"menubar"         time:   [58.148 us 58.343 us 58.602 us]
Found 22 outliers among 100 measurements (22.00%)
  4 (4.00%) low severe
  1 (1.00%) high mild
  17 (17.00%) high severe

unicode                 time:   [4.6397 us 4.6947 us 4.7749 us]
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) high mild
  14 (14.00%) high severe

parse_ctx/"browser"     time:   [245.25 us 247.41 us 250.85 us]
Found 10 outliers among 100 measurements (10.00%)
  8 (8.00%) high mild
  2 (2.00%) high severe
parse_ctx/"preferences" time:   [604.51 us 606.86 us 609.26 us]

# With ArcStr

$ git checkout rc-str-ast && git rev-parse HEAD
e807455276524572f69d9f275dbff005d6928d31
$ cargo bench --bench parser
parse/"simple"          time:   [23.790 us 24.533 us 25.135 us]
                        change: [+65.766% +70.315% +75.086%] (p = 0.00 < 0.05)
                        Performance has regressed.
parse/"preferences"     time:   [257.25 us 258.34 us 259.64 us]
                        change: [+7.8525% +8.9906% +10.152%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 29 outliers among 100 measurements (29.00%)
  9 (9.00%) low severe
  4 (4.00%) high mild
  16 (16.00%) high severe
parse/"menubar"         time:   [72.983 us 73.360 us 73.747 us]
                        change: [+24.247% +25.545% +26.511%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 37 outliers among 100 measurements (37.00%)
  3 (3.00%) low severe
  16 (16.00%) low mild
  10 (10.00%) high mild
  8 (8.00%) high severe

unicode                 time:   [4.6590 us 4.6968 us 4.7400 us]
                        change: [-1.2845% +0.2354% +1.3430%] (p = 0.78 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

parse_ctx/"browser"     time:   [277.00 us 280.18 us 283.69 us]
                        change: [+10.566% +11.847% +13.102%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe
parse_ctx/"preferences" time:   [655.87 us 662.29 us 670.78 us]
                        change: [+7.6288% +8.5293% +9.5709%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

# Tweaked the benchmarks to not copy the source text every time

$ git rev-parse HEAD
15d0d8527a404c419aa5c096ae5428ee424eff41
$ cargo bench --bench parser
parse/"simple"          time:   [19.246 us 19.453 us 19.764 us]
                        change: [-14.504% -11.978% -9.3610%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe
parse/"preferences"     time:   [252.80 us 257.30 us 262.51 us]
                        change: [-2.6368% -1.7249% -0.7337%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 38 outliers among 100 measurements (38.00%)
  21 (21.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  14 (14.00%) high severe
parse/"menubar"         time:   [71.020 us 71.441 us 71.934 us]
                        change: [-3.5266% -2.9758% -2.3899%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

unicode                 time:   [4.7729 us 4.8176 us 4.8938 us]
                        change: [+1.9107% +3.4854% +5.6462%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  12 (12.00%) high severe

parse_ctx/"browser"     time:   [270.87 us 272.46 us 274.06 us]
                        change: [-2.6351% -1.7316% -0.8859%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
parse_ctx/"preferences" time:   [641.53 us 646.67 us 652.89 us]
                        change: [-3.7101% -2.6661% -1.8317%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

If it were just a case of nice code I'd prefer the ArcStr AST over any of the other solutions in this PR, simply because not having to worry about lifetime annotations or use rental to manage the string's memory makes the code more ergonomic... But considering Fluent is used by Firefox to localise the UI and parsing directly impacts Firefox's startup time, I'm inclined to go with the generic AST + Visitor<S> approach.

Michael-F-Bryan commented 4 years ago

Would someone be able to review this PR? Other than the JSON code it should be good to go.

As an aside, is there any reason why we mirror the entire AST when serializing JSON instead of adding annotations to the original AST?

zbraniecki commented 4 years ago

Are you referring to a type which will take slices into a reference-counted string on the heap? I put together a proof-of-concept version which uses the bytes crate for all the heavy lifting.

No, I meant Yandros https://users.rust-lang.org/t/fluent-l10n-system-parser-ast-serializer-decision/27507/2

Would someone be able to review this PR? Other than the JSON code it should be good to go.

I'd prefer to see the Visitor PR first.

As an aside, is there any reason why we mirror the entire AST when serializing JSON instead of adding annotations to the original AST?

Yeah, the main reason is that there's enough of custom serialization that I didn't want to pollute the AST. But with Serde maturing and less of the code required, I'd be open to pull serde into main ast.rs behind cfg's and then have some serde.rs that has custom deserializers for Option and Vec.

Michael-F-Bryan commented 4 years ago

No, I meant Yandros https://users.rust-lang.org/t/fluent-l10n-system-parser-ast-serializer-decision/27507/2

Ah fair enough. I was thinking of implementing most Visitors using impl<S: AsRef<str>> Visitor<S> for ...

I'd prefer to see the Visitor PR first.

I've started playing around with the visitor pattern and I've got a feeling it could be a pain (see #188). The problem is we've currently got an "external" API where bits of code will ask for a reference to a particular AST node, and we want to convert it to an "internal" API where information is passed around via closures and the references are only valid for the duration of the method call.

In particular I feel like I'd need to rewrite large swathes of the resolver code to move away from returning a FluentValue, and I don't think it'll be as simple as storing a std::io::Write inside a visitor struct and emitting write!() calls as we traverse the tree...

Pike commented 4 years ago

Thanks for giving this a shot, Michael.

We've talked about the visitor pattern in fluent-kotlin, too. I wasn't a fan of the amount of duct tape one needs to write. I was also concerned about readability of the documentation that comes from that pattern. The amount of code would be less of a concern if the code was generated, but the documentation piece is still there, I guess.

The other thing about visitors is that they're purely consumers. In my experience, they fall flat when you try to build creators and modifiers. The python Transformer is strictly untyped and can do stuff, but the typescript one is typed in a way that I don't see how to actually transform a tree.

I pondered using generics, but I think that means that you're creating something that has a specialization (?) for each visitor function. That didn't sound like fun.

Michael-F-Bryan commented 4 years ago

It's possible to build a new tree by using a pushdown automaton which will build up a new AST on each transition, but you'll likely run into issues with the borrow checker.... When you recurse and add a new state to the stack, you'd be forced to "find" where you want to insert new nodes instead of just holding a reference to the parent in the automaton (multiple nested states means multiple mutable pointers into the same tree, which is no bueno).

I'm sure you could work around this with enough dynamic dispatch and indirection, but to be honest I'd just write the traversal/modification by hand at that point. That way you're free to do whatever you want and don't need to conform to a particular interface.

Minor tangent exploring ways to keep our visitor method I was thinking we might be able to take the functional approach and accept some sort of `map` function which transforms a `&Resource` where `S: AsRef` into some new type, `R`... ```rust enum FluentResource { Owned(Resource), Rental(rentals::FluentResource), } impl FluentResource { fn map(&self, mapper: F) -> R where F: FnOnce(&Resource) -> R, S: AsRef, { match self { FluentResource::Owned(owned) => mapper(owned), FluentResource::Rental(rental) => mapper(rental.0.all().ast), } } } ``` ... But unfortunately this is just our `Visitor<&'a str> + Visitor` dilemma in disguise, and what we really want to express here is `F: for> FnOnce(&Resource) -> R`... Which would require higher-kinded types.

It seems like the easiest thing would be to not make FluentResource a COW type and just let it keep using rental.

That means you can only really create/manipulate an AST when it's not in a bundle, but for most tools that shouldn't be an issue because they'll be working with the parsed AST directly (like I did with my fluent-machine-translations project)... I think?

zbraniecki commented 3 years ago

Quick update. I've started working on a remodel of the AST to be generic in https://github.com/zbraniecki/fluent-rs/tree/generic

It's definitely an early WIP, but I got the parser to work and the performance seems to be on par with master.

It handles ast::Resource<String> and ast::Resource<&str> quite well, and in theory should also handle ast::Resource<Cow<str>> although I didn't get there yet.

I'll try to finish the fluent-syntax next, and hopefully implement ToOwned for ast::Resource<&str>.

lmk what you think, if you have a chance to skim through the code

zbraniecki commented 3 years ago

@Michael-F-Bryan does https://github.com/projectfluent/fluent-rs/pull/198 address your issue?

Michael-F-Bryan commented 3 years ago

It should work. For my purposes I just needed a way to change the strings used by the AST. At the time I worked around it by creating a string pool that modified AST nodes borrow from and which outlives the original AST so variance made things work, but having a generic AST is a better solution.

Would it be a good idea to add some sort of map() function which lets you convert a AST<P> to an AST<Q> with some function P -> Q? That way you could do things like my_ast.map(String::from) to convert a borrowed AST to an owned AST, or do quirky things like interning to convert an AST<String> to AST<&str> to reduce memory usage (probably not a realistic example, but you get the gist).

zbraniecki commented 3 years ago

Yeah, my thinking is to add Borrow<ast::Resource<AsRef<str>>> for ast::Resource<S> and ToOwned for ast::Resource<Into<String>>. I'm waiting for @Manishearth to verify that this approach is reasonable.

I'll do it as a separate Issue/PR on top of that work. This is just ground work to enable owned AST to exist