Closed Michael-F-Bryan closed 3 years ago
Benchmark comparing 25dde75 against master
, c9e4565 (i.e. replace &'ast str
with Cow<'ast, str>
):
$ cargo bench --bench parser -- --baseline base
Compiling fluent-syntax v0.9.3 (C:\Projects\fluent-rs\fluent-syntax)
Finished bench [optimized] target(s) in 9.46s
Running C:\Projects\fluent-rs\target\release\deps\parser-99e6937ba38adfc9.exe
Gnuplot not found, using plotters backend
parse/"simple" time: [43.395 us 45.529 us 48.144 us]
change: [+25.007% +29.004% +33.622%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
parse/"preferences" time: [387.97 us 390.36 us 393.46 us]
change: [+3.4989% +6.0201% +8.2245%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
parse/"menubar" time: [104.65 us 108.02 us 112.16 us]
change: [+6.1007% +8.5359% +11.820%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
unicode time: [7.7112 us 8.0368 us 8.4848 us]
change: [+1.5234% +4.8896% +8.6534%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) high mild
9 (9.00%) high severe
parse_ctx/"browser" time: [399.25 us 405.31 us 412.26 us]
change: [+5.0663% +6.1496% +7.3562%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
Benchmarking parse_ctx/"preferences": Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.1s or reduce sample count to 70.
parse_ctx/"preferences" time: [989.68 us 1.0161 ms 1.0505 ms]
change: [+8.0732% +11.960% +16.002%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
Other than the "simple" benchmark, we're looking at a 6-11% performance regression.
I'll measure against Fx, but 6-11%
would likely translate to ~1.5-2ms on startup on a very fast machine - https://bugzilla.mozilla.org/show_bug.cgi?id=1560038#c57
On a slower user machine this may be 5-10ms. I'd prefer to maintain a ready-only AST over paying that price for convenience.
Benchmark comparing 59fb6ad against master (c9e4565) after updating fluent-bundle
to use Cow<str>
.
$ cargo bench --bench resolver -- --baseline base
Compiling fluent-syntax v0.9.3 (C:\Projects\fluent-rs\fluent-syntax)
Compiling fluent-bundle v0.11.0 (C:\Projects\fluent-rs\fluent-bundle)
Finished bench [optimized] target(s) in 11.11s
Running C:\Projects\fluent-rs\target\release\deps\resolver-d65a2ee18bbb4c2c.exe
Gnuplot not found, using plotters backend
resolve/"simple" time: [9.5628 us 9.6640 us 9.7876 us]
change: [-14.508% -10.849% -7.4316%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
resolve/"preferences" time: [140.55 us 142.14 us 144.46 us]
change: [+2.1204% +5.5609% +8.9761%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
resolve/"menubar" time: [31.260 us 31.382 us 31.524 us]
change: [-2.0665% -0.6816% +0.5605%] (p = 0.33 > 0.05)
No change in performance detected.
resolve/"unescape" time: [1.7500 us 1.7625 us 1.7773 us]
change: [-2.0425% -0.9417% -0.0567%] (p = 0.06 > 0.05)
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
It looks like the "preferences" benchmark took a 5% hit, so switching to Cow<str>
also has a negative effect when formatting a message.
I'd prefer to maintain a ready-only AST over paying that price for convenience.
@zbraniecki, I agree. I was thinking we could use Yandros's trick of parameterising the AST nodes over some type S: AsRef<str>
and giving it a default value of S = &'ast str
. That way the base case will be identical to the current AST, but if you need to do mutations you can transform it to using Cow<str>
when folding over the tree... Thoughts?
I'm open to it, yeah!
Soo.... I'm seeing something strange here. In 5e26a42 I've made the entire AST generic over its string type (i.e. &'ast str
-> S
) and the parser returns a concrete ast::Resource<&'ast str>
. In theory this is exactly identical to the previous AST at the binary level (we replaced &'ast str
with a generic parameter, then instantiated that generic with &'ast str
).
However, now parse/simple
regressed by 6%, parse/menubar
improved by 4.9%, and parse/preferences
improved by 7.7%. The parse_ctx
benchmarks aren't showing any significant change.
$ cargo bench --bench parser -- --baseline base
Compiling fluent-syntax v0.9.3 (C:\Projects\fluent-rs\fluent-syntax)
Finished bench [optimized] target(s) in 10.47s
Running C:\Projects\fluent-rs\target\release\deps\parser-99e6937ba38adfc9.exe
Gnuplot not found, using plotters backend
parse/"simple" time: [35.461 us 35.716 us 36.088 us]
change: [+4.4435% +5.8972% +7.6256%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
parse/"preferences" time: [337.48 us 342.14 us 348.18 us]
change: [-9.8723% -7.7005% -5.7328%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
parse/"menubar" time: [91.829 us 92.043 us 92.279 us]
change: [-5.8071% -4.8987% -4.0698%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) low severe
1 (1.00%) low mild
4 (4.00%) high mild
3 (3.00%) high severe
unicode time: [7.8541 us 8.1120 us 8.4140 us]
change: [+0.2610% +4.5582% +11.198%] (p = 0.08 > 0.05)
No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe
parse_ctx/"browser" time: [386.82 us 394.27 us 403.12 us]
change: [+0.1738% +1.7793% +3.6869%] (p = 0.05 > 0.05)
No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
6 (6.00%) high mild
6 (6.00%) high severe
parse_ctx/"preferences" time: [886.97 us 899.27 us 913.82 us]
change: [-4.2761% -1.4430% +0.9149%] (p = 0.31 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
I've got two hypotheses for these results:
I'm leaning towards the latter because this is just a commodity PC. The easiest way to check is to try on my laptop and see whether the benchmark results change at all.
@zbraniecki, I've reverted the Cow<str>
commits instead of force pushing them out of existence so we have something to reference when comparing benchmark results. Once we're satisfied that this PR doesn't regress too much I'll be able to squash the lot into a single change.
I'm doing the benchmarks again on my personal laptop.
Looking at those results, it seems like Cow<str>
(25dde75) adds about 4% on average and the generic version (5e26a42) improves performance by about 20-25%... I'm no expert on performance tuning, but it feels almost too good to be true that the generic version is 20% faster than the original.
$ rustc --version --verbose
rustc 1.46.0-nightly (a8cf39911 2020-06-21)
binary: rustc
commit-hash: a8cf3991177f30694200002cd9479ffbbe6d9a1a
commit-date: 2020-06-21
host: x86_64-unknown-linux-gnu
release: 1.46.0-nightly
LLVM version: 10.0
$ uname -a
Linux archiso 5.7.4-arch1-1 #1 SMP PREEMPT Thu, 18 Jun 2020 16:01:07 +0000 x86_64 GNU/Linux
$ nprocs
8
# Using master as the baseline
$ git checkout c9e4565
$ cargo bench --bench parser -- --save-baseline original
Finished bench [optimized] target(s) in 0.05s
Running /home/michael/Documents/forks/fluent-rs/target/release/deps/parser-4a42273044a13d94
parse/"simple" time: [12.725 us 12.779 us 12.846 us]
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
parse/"preferences" time: [208.46 us 234.08 us 269.13 us]
parse/"menubar" time: [57.577 us 57.784 us 58.006 us]
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
6 (6.00%) high mild
unicode time: [4.4069 us 4.4364 us 4.4801 us]
Found 27 outliers among 100 measurements (27.00%)
22 (22.00%) low severe
3 (3.00%) high mild
2 (2.00%) high severe
parse_ctx/"browser" time: [235.04 us 236.73 us 238.86 us]
Found 20 outliers among 100 measurements (20.00%)
1 (1.00%) high mild
19 (19.00%) high severe
parse_ctx/"preferences" time: [581.11 us 582.82 us 585.39 us]
Found 14 outliers among 100 measurements (14.00%)
8 (8.00%) low severe
1 (1.00%) high mild
5 (5.00%) high severe
cargo bench --bench parser -- --save-baseline original 106.76s user 0.74s system 163% cpu 1:05.85 total
# Compare Cow<str> to original
$ git checkout 25dde75
$ cargo bench --bench parser -- --baseline original
Finished bench [optimized] target(s) in 0.03s
Running /home/michael/Documents/forks/fluent-rs/target/release/deps/parser-4a42273044a13d94
parse/"simple" time: [15.871 us 16.578 us 17.225 us]
change: [+13.777% +16.698% +19.557%] (p = 0.00 < 0.05)
Performance has regressed.
Found 20 outliers among 100 measurements (20.00%)
2 (2.00%) high mild
18 (18.00%) high severe
parse/"preferences" time: [237.04 us 238.11 us 239.19 us]
change: [-73.657% -67.879% -59.132%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
parse/"menubar" time: [58.760 us 59.268 us 59.942 us]
change: [+2.0590% +2.6362% +3.3442%] (p = 0.00 < 0.05)
Performance has regressed.
Found 20 outliers among 100 measurements (20.00%)
9 (9.00%) low severe
4 (4.00%) high mild
7 (7.00%) high severe
unicode time: [4.5704 us 4.6122 us 4.6706 us]
change: [+2.9406% +3.8877% +5.0021%] (p = 0.00 < 0.05)
Performance has regressed.
Found 21 outliers among 100 measurements (21.00%)
5 (5.00%) high mild
16 (16.00%) high severe
parse_ctx/"browser" time: [244.86 us 245.86 us 246.88 us]
change: [+3.6058% +4.3485% +4.9392%] (p = 0.00 < 0.05)
Performance has regressed.
parse_ctx/"preferences" time: [607.01 us 609.46 us 611.88 us]
change: [+3.8941% +4.3636% +4.7830%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
cargo bench --bench parser -- --baseline original 152.89s user 1.35s system 218% cpu 1:10.75 total
# Compare generic to original
$ git checkout 5e26a42
$ cargo bench --bench parser -- --baseline original
Compiling fluent-syntax v0.9.3 (/home/michael/Documents/forks/fluent-rs/fluent-syntax)
Finished bench [optimized] target(s) in 2.50s
Running /home/michael/Documents/forks/fluent-rs/target/release/deps/parser-4a42273044a13d94
parse/"simple" time: [12.790 us 12.872 us 12.979 us]
change: [-1.8642% -0.6152% +0.9808%] (p = 0.38 > 0.05)
No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
parse/"preferences" time: [174.77 us 175.35 us 176.09 us]
change: [-80.612% -76.301% -69.792%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
parse/"menubar" time: [43.262 us 43.362 us 43.469 us]
change: [-24.915% -24.614% -24.307%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
unicode time: [3.8113 us 3.9722 us 4.1926 us]
change: [+92.864% +162.38% +248.04%] (p = 0.00 < 0.05)
Performance has regressed.
Found 24 outliers among 100 measurements (24.00%)
1 (1.00%) high mild
23 (23.00%) high severe
parse_ctx/"browser" time: [183.62 us 184.07 us 184.56 us]
change: [-22.442% -21.875% -21.433%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
Benchmarking parse_ctx/"preferences": Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.1s or reduce sample count to 50.
parse_ctx/"preferences" time: [455.44 us 457.74 us 461.16 us]
change: [-21.046% -20.457% -19.777%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
cargo bench --bench parser -- --baseline original 125.23s user 1.00s system 220% cpu 57.349 total
but it feels almost too good to be true that the generic version is 20% faster than the original.
Hah, yeah, it seems a bit like a random optimization that may be overexpressed in a microbenchmark. I'll test that against real world code, but the approach is very promising.
This way we could have AST<&str>
and if you need to edit it, you can copy it into AST<String>
and, work with, and then serialize.
This looks good!
I like the use of generics here over macros.
What I'd like to get to is handling of FluentResource
more like Cow
where it is either the rental::FluentResource
or ast::Resource<String>
.
Maybe something like this:
#[derive(Debug)]
pub enum FluentResource {
Owned(ast::Resource<String>),
String(rentals::FluentResource)
}
impl FluentResource {
pub fn try_new_slice(source: String) -> Result<Self, (Self, Vec<ParserError>)> {
let mut errors = None;
let res = rentals::FluentResource::new(source, |s| match parse(s) {
Ok(ast) => ast,
Err((ast, err)) => {
errors = Some(err);
ast
}
});
let res = FluentResource::String(res);
if let Some(errors) = errors {
Err((res, errors))
} else {
Ok(res)
}
}
pub fn ast<S>(&self) -> &ast::Resource<S>
where S: Borrow<str> {
match self {
Self::Owned(res) => res,
Self::String(res) => res.all().ast
}
}
}
and then we could make the resolver handle either because it only needs the read
view into either variant.
What do you think?
What do you think?
I like the idea, but I'm not sure if it'll pan out... By having resources use either owned or borrowed ASTs you can't let the user access them via a single method because they are two different types (even using something like impl Trait
, because impl Trait
is syntactic sugar for returning a single unnamed type).
You could maybe add an extra level of indirection via the visitor pattern. The idea being users need to pass in something which satisfies Visitor<S> where S: AsRef<str>
then we write a function which recursively walks the tree invoking visitor methods for either the owned or borrowed trees.
trait Visitor<S> {
fn visit_resource&mut self, res: &ast::Resource<S>) { /* default recursion */ }
fn visit_inline_expression(&mut self, expr: &ast::InlineExpression<S>) { /* default recursion */ }
...
}
impl FluentResource {
fn visit_ast<V>(&self, visitor: &mut V)
where V: Visitor<String> + for<'ast> Visitor<&'ast str>,
{
match self {
FluentResource::Owned(owned) => visitor.visit_resource(owned),
FluentResource::Rental(rental) => visitor.visit_resource(rental),
}
}
}
You lose the ability to use nice things like pattern matching though, so it'd work for evaluating a message but could make other operations a pain. Performance wouldn't change because we're using compile-time generics and you'd do this recursive traversal when evaluating messages anyway.
The alternative would be to use Resource<Cow<str>>
and rental all the time, but then we're back to the performance issues from before with the added problem that you need to traverse the AST twice (once when parsing to Resource<&'ast str>
and another to convert &'ast str
to Cow<'ast, str>
).
You lose the ability to use nice things like pattern matching though, so it'd work for evaluating a message but could make other operations a pain.
What other operations do you think of? I'm getting warmed up to the idea that maybe a Visitor
model would work?
The resolver is a pretty black-box solution, and if I'd end up with it being a generic that can handle Resource<String>
and Resource<&str>
then I'm totally fine with that.
I'm also looking again at the StrView
idea.
Seems like both approaches on paper should give us no performance hit.
Do you have an opinion on which one would lead to more managable code?
What other operations do you think of? I'm getting warmed up to the idea that maybe a Visitor model would work?
I'm not sure. I guess most tools (my primary area of interest) will be using the AST layer directly and not touching the FluentResource
type, so it won't affect them. Other than that the only real consumer of a FluentResource
is the resolver and as long as it can format a message, nobody cares about its internals.
I'm also looking again at the StrView idea.
Are you referring to a type which will take slices into a reference-counted string on the heap? I put together a proof-of-concept version which uses the bytes
crate for all the heavy lifting.
ArcStr
type.I'm going to play around with it in another branch and see how it affects performance. The only part that'll change performance is bumping the reference count every time we slice into the string.
Here are the benchmark results for #187 (using an ArcStr
instead of &str
or Cow<str>
) and you can see performance regressed by a non-trivial amount.
I thought it might have been due to the way the benchmarks are constructed (it stores the text as String
s, so every run of every benchmark ended up copying the contents to a new ArcStr
), but switching everything to ArcStr
in Michael-F-Bryan/fluent-rs@15d0d8527a404c419aa5c096ae5428ee424eff41 to skip the initial copy didn't help too much...
# Baseline (master)
$ git checkout master && git rev-parse HEAD
c9e456515f4c084eb9650f3910722a31707a6b32
$ cargo bench --bench parser
parse/"simple" time: [12.823 us 12.903 us 13.024 us]
Found 14 outliers among 100 measurements (14.00%)
9 (9.00%) high mild
5 (5.00%) high severe
parse/"preferences" time: [236.38 us 238.11 us 240.29 us]
Found 15 outliers among 100 measurements (15.00%)
5 (5.00%) low mild
7 (7.00%) high mild
3 (3.00%) high severe
parse/"menubar" time: [58.148 us 58.343 us 58.602 us]
Found 22 outliers among 100 measurements (22.00%)
4 (4.00%) low severe
1 (1.00%) high mild
17 (17.00%) high severe
unicode time: [4.6397 us 4.6947 us 4.7749 us]
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) high mild
14 (14.00%) high severe
parse_ctx/"browser" time: [245.25 us 247.41 us 250.85 us]
Found 10 outliers among 100 measurements (10.00%)
8 (8.00%) high mild
2 (2.00%) high severe
parse_ctx/"preferences" time: [604.51 us 606.86 us 609.26 us]
# With ArcStr
$ git checkout rc-str-ast && git rev-parse HEAD
e807455276524572f69d9f275dbff005d6928d31
$ cargo bench --bench parser
parse/"simple" time: [23.790 us 24.533 us 25.135 us]
change: [+65.766% +70.315% +75.086%] (p = 0.00 < 0.05)
Performance has regressed.
parse/"preferences" time: [257.25 us 258.34 us 259.64 us]
change: [+7.8525% +8.9906% +10.152%] (p = 0.00 < 0.05)
Performance has regressed.
Found 29 outliers among 100 measurements (29.00%)
9 (9.00%) low severe
4 (4.00%) high mild
16 (16.00%) high severe
parse/"menubar" time: [72.983 us 73.360 us 73.747 us]
change: [+24.247% +25.545% +26.511%] (p = 0.00 < 0.05)
Performance has regressed.
Found 37 outliers among 100 measurements (37.00%)
3 (3.00%) low severe
16 (16.00%) low mild
10 (10.00%) high mild
8 (8.00%) high severe
unicode time: [4.6590 us 4.6968 us 4.7400 us]
change: [-1.2845% +0.2354% +1.3430%] (p = 0.78 > 0.05)
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
parse_ctx/"browser" time: [277.00 us 280.18 us 283.69 us]
change: [+10.566% +11.847% +13.102%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
parse_ctx/"preferences" time: [655.87 us 662.29 us 670.78 us]
change: [+7.6288% +8.5293% +9.5709%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high severe
# Tweaked the benchmarks to not copy the source text every time
$ git rev-parse HEAD
15d0d8527a404c419aa5c096ae5428ee424eff41
$ cargo bench --bench parser
parse/"simple" time: [19.246 us 19.453 us 19.764 us]
change: [-14.504% -11.978% -9.3610%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
parse/"preferences" time: [252.80 us 257.30 us 262.51 us]
change: [-2.6368% -1.7249% -0.7337%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 38 outliers among 100 measurements (38.00%)
21 (21.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
14 (14.00%) high severe
parse/"menubar" time: [71.020 us 71.441 us 71.934 us]
change: [-3.5266% -2.9758% -2.3899%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
unicode time: [4.7729 us 4.8176 us 4.8938 us]
change: [+1.9107% +3.4854% +5.6462%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) low mild
2 (2.00%) high mild
12 (12.00%) high severe
parse_ctx/"browser" time: [270.87 us 272.46 us 274.06 us]
change: [-2.6351% -1.7316% -0.8859%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
parse_ctx/"preferences" time: [641.53 us 646.67 us 652.89 us]
change: [-3.7101% -2.6661% -1.8317%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
If it were just a case of nice code I'd prefer the ArcStr
AST over any of the other solutions in this PR, simply because not having to worry about lifetime annotations or use rental
to manage the string's memory makes the code more ergonomic... But considering Fluent is used by Firefox to localise the UI and parsing directly impacts Firefox's startup time, I'm inclined to go with the generic AST + Visitor<S>
approach.
Would someone be able to review this PR? Other than the JSON code it should be good to go.
As an aside, is there any reason why we mirror the entire AST when serializing JSON instead of adding annotations to the original AST?
Are you referring to a type which will take slices into a reference-counted string on the heap? I put together a proof-of-concept version which uses the bytes crate for all the heavy lifting.
No, I meant Yandros https://users.rust-lang.org/t/fluent-l10n-system-parser-ast-serializer-decision/27507/2
Would someone be able to review this PR? Other than the JSON code it should be good to go.
I'd prefer to see the Visitor
PR first.
As an aside, is there any reason why we mirror the entire AST when serializing JSON instead of adding annotations to the original AST?
Yeah, the main reason is that there's enough of custom serialization that I didn't want to pollute the AST. But with Serde maturing and less of the code required, I'd be open to pull serde into main ast.rs
behind cfg's
and then have some serde.rs
that has custom deserializers for Option
and Vec
.
No, I meant Yandros https://users.rust-lang.org/t/fluent-l10n-system-parser-ast-serializer-decision/27507/2
Ah fair enough. I was thinking of implementing most Visitor
s using impl<S: AsRef<str>> Visitor<S> for ...
I'd prefer to see the Visitor PR first.
I've started playing around with the visitor pattern and I've got a feeling it could be a pain (see #188). The problem is we've currently got an "external" API where bits of code will ask for a reference to a particular AST node, and we want to convert it to an "internal" API where information is passed around via closures and the references are only valid for the duration of the method call.
In particular I feel like I'd need to rewrite large swathes of the resolver code to move away from returning a FluentValue
, and I don't think it'll be as simple as storing a std::io::Write
inside a visitor struct and emitting write!()
calls as we traverse the tree...
Thanks for giving this a shot, Michael.
We've talked about the visitor pattern in fluent-kotlin, too. I wasn't a fan of the amount of duct tape one needs to write. I was also concerned about readability of the documentation that comes from that pattern. The amount of code would be less of a concern if the code was generated, but the documentation piece is still there, I guess.
The other thing about visitors is that they're purely consumers. In my experience, they fall flat when you try to build creators and modifiers. The python Transformer is strictly untyped and can do stuff, but the typescript one is typed in a way that I don't see how to actually transform a tree.
I pondered using generics, but I think that means that you're creating something that has a specialization (?) for each visitor function. That didn't sound like fun.
It's possible to build a new tree by using a pushdown automaton which will build up a new AST on each transition, but you'll likely run into issues with the borrow checker.... When you recurse and add a new state to the stack, you'd be forced to "find" where you want to insert new nodes instead of just holding a reference to the parent in the automaton (multiple nested states means multiple mutable pointers into the same tree, which is no bueno).
I'm sure you could work around this with enough dynamic dispatch and indirection, but to be honest I'd just write the traversal/modification by hand at that point. That way you're free to do whatever you want and don't need to conform to a particular interface.
It seems like the easiest thing would be to not make FluentResource
a COW type and just let it keep using rental
.
That means you can only really create/manipulate an AST when it's not in a bundle, but for most tools that shouldn't be an issue because they'll be working with the parsed AST directly (like I did with my fluent-machine-translations
project)... I think?
Quick update. I've started working on a remodel of the AST to be generic in https://github.com/zbraniecki/fluent-rs/tree/generic
It's definitely an early WIP, but I got the parser to work and the performance seems to be on par with master.
It handles ast::Resource<String>
and ast::Resource<&str>
quite well, and in theory should also handle ast::Resource<Cow<str>>
although I didn't get there yet.
I'll try to finish the fluent-syntax
next, and hopefully implement ToOwned
for ast::Resource<&str>
.
lmk what you think, if you have a chance to skim through the code
@Michael-F-Bryan does https://github.com/projectfluent/fluent-rs/pull/198 address your issue?
It should work. For my purposes I just needed a way to change the strings used by the AST. At the time I worked around it by creating a string pool that modified AST nodes borrow from and which outlives the original AST so variance made things work, but having a generic AST is a better solution.
Would it be a good idea to add some sort of map()
function which lets you convert a AST<P>
to an AST<Q>
with some function P -> Q
? That way you could do things like my_ast.map(String::from)
to convert a borrowed AST to an owned AST, or do quirky things like interning to convert an AST<String>
to AST<&str>
to reduce memory usage (probably not a realistic example, but you get the gist).
Yeah, my thinking is to add Borrow<ast::Resource<AsRef<str>>> for ast::Resource<S>
and ToOwned for ast::Resource<Into<String>>
. I'm waiting for @Manishearth to verify that this approach is reasonable.
I'll do it as a separate Issue/PR on top of that work. This is just ground work to enable owned AST to exist
Fixes #104