projectfluent / fluent-rs

Rust implementation of Project Fluent
https://projectfluent.org
Apache License 2.0
1.07k stars 96 forks source link

Align FluentBundle with the new JS FluentBundle API #119

Closed zbraniecki closed 5 years ago

zbraniecki commented 5 years ago

This is basically porting https://github.com/projectfluent/fluent.js/pull/380 to fluent-bundle-rs.

zbraniecki commented 5 years ago

The initial performance is worrying.

Gnuplot not found, disabling plotting
resolve/"simple"        time:   [11.263 us 11.300 us 11.339 us]                              
                        change: [+19.138% +19.863% +20.589%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe
resolve/"preferences"   time:   [137.05 us 137.99 us 139.37 us]                                  
                        change: [+28.618% +30.002% +31.232%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe
resolve/"menubar"       time:   [44.473 us 44.652 us 44.834 us]                               
                        change: [+63.365% +64.480% +65.556%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
resolve/"unescape"      time:   [1.7313 us 1.7390 us 1.7468 us]                                
                        change: [-0.9290% -0.2554% +0.4500%] (p = 0.48 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

Gnuplot not found, disabling plotting

I haven't investigate the source of the regression yet. My initial guess is that creating two hashmaps (one to return from get_message and then the other to format) would likely be the culprit, but I'll investigate further.

zbraniecki commented 5 years ago

Yep! Updating the benchmark to not use the compound wrapper, but rather resolve value/attributes just like the JS does yields a nice perf win compared to master:

Gnuplot not found, disabling plotting
resolve/"simple"        time:   [6.1629 us 6.1843 us 6.2070 us]                              
                        change: [-35.251% -34.824% -34.389%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  9 (9.00%) high mild
  1 (1.00%) high severe
resolve/"preferences"   time:   [91.970 us 92.911 us 94.163 us]                                  
                        change: [-19.398% -18.834% -18.125%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe
resolve/"menubar"       time:   [22.784 us 22.872 us 22.966 us]                               
                        change: [-18.105% -17.580% -17.061%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
resolve/"unescape"      time:   [1.3784 us 1.3841 us 1.3901 us]                                
                        change: [-25.637% -25.110% -24.523%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  9 (9.00%) high mild

Gnuplot not found, disabling plotting

That likely means that hashmap allocation is costly, but that also allows users not to allocate if they don't need to.

zbraniecki commented 5 years ago

I think I found a problem that may be hard to overcome.

In the current design, FluentBundle is a standalone struct. FluentResource as well. The relationship between them can be that FluentBundle owns Rc<FluentResource>.

Since all WebIDL types have to be Rc, that works great.

But, in the new design, I need to either expose an ast::Pattern or some wrapper which would be struct FluentPattern(&ast::Pattern).

In both cases, I need to hold a reference to a field inside the FluentResource and I need to Rc it. But I can't guarantee lifetimes between FluentResource and FluentPattern. On the JS side, there's nothing preventing someone from doing:

let res = new FluentResource("key = Value");
let bundle = new FluentBundle("en");
bundle.addResource(res);
let msg = bundle.get_message("key");
bundle = null;
res = null;

console.log(msg.value);

I don't see an easy way to get there, unless I change the AST to store every pattern as an Rc<ast::Pattern>. If we do that, I'm afraid that our AST will feel inconsistent. Should we move all our ast nodes to be Rced? That feels like an unnecessary overhead.

zbraniecki commented 5 years ago

Hmm, one option that miight work is that I could try to make the Pattern be a micro-resource, with its own source: String and a view into it as ast::Pattern. The other is to try to move to Cow for the whole AST and then create ast::Pattern as a copy of the one from FluentResource with owned values.

Quite frankly, I don't find either approach really appealing especially since its not supposed to be an edge case, but the core of how we want to handle message formatting. It seems like the design of the API is skewed toward GC'ed languages.

stasm commented 5 years ago

Now that entries are stored in bundles as indices, does the lifetime of FluentBundle play a role in this discussion? Or is the issue exclusively about the lifetimes of FluentResource and Pattern?

Re. hashmap instantiation: In #120, it looks like a new Message { value, attributes } is created in every call to get_message2. What if instead you created it in add_resource?

Alternatively, would it make more sense to return a vector of attributes rather than a hashmap? This would be a divergence from the JS API, but perhaps well justified?

stasm commented 5 years ago

Re. Pattern ownership. IIUC, if we want to allow the possibility to expose patterns via WebIDL (as part of the Message shape), there's no way to avoid the Rc. We could Rc eagerly inside of the bundle, or lazily in the extern "C" fn fluent_bundle_get_message. Is my understanding correct?

I don't see an easy way to get there, unless I change the AST to store every pattern as an Rc. If we do that, I'm afraid that our AST will feel inconsistent. Should we move all our ast nodes to be Rced? That feels like an unnecessary overhead.

Can we only Rc patterns that are values of messages and messages attributes, but not Rc patterns in terms, term attributes, nor variants? This would make it clearer which AST nodes are allowed to be lent via the FFI.

zbraniecki commented 5 years ago

Or is the issue exclusively about the lifetimes of FluentResource and Pattern?

Just this.

We could Rc eagerly inside of the bundle, or lazily in the extern "C" fn fluent_bundle_get_message. Is my understanding correct?

I don't think we can do that lazily. Rc takes ownership, so we need to decide what's the status of FluentResource (we currently allow it to be owned, borrowed or Rc/Arc) and then try to do something about the `Pattern.

Can we only Rc patterns that are values of messages and messages attributes, but not Rc patterns in terms, term attributes, nor variants?

Sure, it's just that such AST will look quirky. I'm trying to decide if this issue is a signal that the API design is flawed.

stasm commented 5 years ago

Sure, it's just that such AST will look quirky. I'm trying to decide if this issue is a signal that the API design is flawed.

I see this as a way to encode in the AST the fact that we want to be able to lend out these patterns to the calling code.

zbraniecki commented 5 years ago

I see this as a way to encode in the AST the fact that we want to be able to lend out these patterns to the calling code.

Which is not what AST is about. It leaks the runtime API expectations onto a view of the source. :(

Having to RC Patterns in source so that we can expose them in public via WebIDL seems weird. I'm trying to think of other ways to achieve that - where the public "Pattern" would hold a weak reference to the bundle and become unusable if the bundle gets out of the scope.

What's your expectation for JS? If I delete the bundle, and the res, should the pattern keep them both alive? Is that a risk of leaking?

stasm commented 5 years ago

Which is not what AST is about. It leaks the runtime API expectations onto a view of the source. :(

But we already do it by using the AST as the data storage for messages in bundles, and by making the resolver work with it, too. An alternative would be to create separate data structures in add_resource, perhaps pre-evaluate some nodes, and resolve that.

zbraniecki commented 5 years ago

But we already do it by using the AST as the data storage for messages in bundles, and by making the resolver work with it, too

Currently, this is an implementation detail. We can switch it anyway we like without changes to the public API.

When I say public I mean - from the perspective of fluent-syntax crate, AST is public. You can load it and operate on it and if we change it, we'll release a new breaking version of the crate. But from the perspective of fluent-bundle crate, AST is private. Changes to it do not affect the API.

So, when someone is working with fluent-syntax API (AST), the fact that some part of the AST is Rced makes no sense.

An alternative would be to create separate data structures in add_resource, perhaps pre-evaluate some nodes, and resolve that.

I may try that instead.

Pike commented 5 years ago

What's your expectation for JS? If I delete the bundle, and the res, should the pattern keep them both alive? Is that a risk of leaking?

My expectation is that that is allowed to throw. Evaluating a message against a different bundle isn't cool.

Which sounds like a bug, but I view it as a compromise between messages being abstract around their values and attributes, and being able to inspect messages w/out evaluating them, while having a small API surface.

zbraniecki commented 5 years ago

Evaluating a message against a different bundle isn't cool.

So, I'm not sure if I agree tbh. I can see this being a perfectly sane way to use Fluent:

let res = new FluentResource("hello = Hello from { -brand-name }");
let res-brand1 = new FluentResource("-brand-name = Firefox");
let res-brand2 = new FluentResource("-brand-name = Nightly");

let bundle1 = new FluentBundle("x-testing");
bundle1.addResource(res);
bundle1.addResource(res-brand1);

let bundle2 = new FluentBundle("x-testing");
bundle2.addResource(res);
bundle2.addResource(res-brand2);

let msg = bundle1.getMessage("hello");
bundle1.formatPattern(msg.value); // Hello from Firefox
bundle2.formatPattern(msg.value); // Hello from Nightly

Which sounds like a bug, but I view it as a compromise between messages being abstract around their values and attributes, and being able to inspect messages w/out evaluating them, while having a small API surface.

I see. I'm wondering if there's a better way to handle that.

zbraniecki commented 5 years ago

There's another benefit of the switch to format_pattern that works really well in the non-Gc world:

In #94 we discussed the possibility of having an API that takes a Writer and feeds it with data. This would allow us to remove another batch of allocations with format_pattern taking a Writer and feeding it with string slices from the source.

I'm not sure how can we get the Rc'ed Pattern into this picture. One idea would be to do:

struct PatternPtr {
  id: String,
  attribute: Option<String>
};

and return it in Message so that when the PatternPtr is passed to the bundle.format_pattern it looks for the message with the given id and optionally also for an attribute with the given name and resolves that pattern.

This feels weird, but will be fast and light.

zbraniecki commented 5 years ago

This has landed in https://github.com/projectfluent/fluent-rs/commit/2a4b92f61285cdf175d1050791a25c5202bcabd0