projectfluent / fluent-rs

Rust implementation of Project Fluent
https://projectfluent.org
Apache License 2.0
1.07k stars 96 forks source link

Investigate bundle size #139

Closed zbraniecki closed 4 years ago

zbraniecki commented 5 years ago

We should try to make fluent-bundle release size as small as possible.

@cmyr pointed out in https://github.com/zbraniecki/unic-locale/issues/10 that at the moment the size increase is ~300kb.

I'd like to look for ways to make it smaller.

According to the analysis the bulk of the size increase goes into rental and syn (used for langid macro).

I think that first low hanging fruit would be to stop using the macro in fluent-bundle, and investigate the rental.

zbraniecki commented 5 years ago
Zibis-MacBook-Pro:fluent-bundle (master)$ cargo bloat --release --crates --example simple-app
Compiling ...
Analyzing target/release/examples/simple-app

 File  .text     Size Crate
32.0%  57.8% 192.7KiB std
 7.5%  13.6%  45.3KiB fluent_syntax
 5.3%   9.6%  31.8KiB [Unknown]
 2.8%   5.1%  16.9KiB fluent_bundle
 2.6%   4.6%  15.4KiB unic_langid_impl
 2.4%   4.3%  14.4KiB intl_pluralrules
 1.4%   2.5%   8.3KiB simple_app
 1.3%   2.4%   8.0KiB fluent_locale
 0.1%   0.1%     352B smallvec
55.4% 100.0% 333.1KiB .text section size, the file size is 601.7KiB

Note: numbers above are a result of guesswork. They are not 100% correct and never will be.
Zibis-MacBook-Pro:fluent-bundle (master)$ cargo bloat --release -n 20 --example simple-app
Compiling ...
Analyzing target/release/examples/simple-app

 File  .text     Size            Crate Name
 1.7%   3.2%  10.5KiB        [Unknown] _read_line_info
 1.5%   2.6%   8.8KiB    fluent_syntax fluent_syntax::parser::parse
 1.4%   2.5%   8.3KiB       simple_app simple_app::main
 1.2%   2.2%   7.4KiB              std ___rdos_backtrace_dwarf_add
 0.8%   1.5%   4.9KiB    fluent_locale fluent_locale::negotiate::negotiate_languages
 0.8%   1.5%   4.9KiB              std rustc_demangle::demangle
 0.8%   1.5%   4.9KiB              std <rustc_demangle::legacy::Demangle as core::fmt::Display>::fmt
 0.8%   1.4%   4.8KiB    fluent_syntax fluent_syntax::parser::get_placeable
 0.8%   1.4%   4.8KiB        [Unknown] _read_attribute
 0.7%   1.3%   4.4KiB   fluent_syntax? <fluent_syntax::ast::InlineExpression as fluent_bundle::resolve::ResolveValue>::resolve
 0.6%   1.1%   3.6KiB              std std::sys_common::backtrace::Printer::output
 0.6%   1.1%   3.6KiB    fluent_syntax fluent_syntax::parser::get_call_arguments
 0.6%   1.0%   3.4KiB        [Unknown] __mh_execute_header
 0.5%   0.9%   3.0KiB unic_langid_impl alloc::slice::merge_sort
 0.5%   0.9%   2.9KiB              std rustc_demangle::v0::Printer::print_type
 0.5%   0.8%   2.8KiB    fluent_bundle fluent_bundle::bundle::FluentBundle<R>::add_resource
 0.5%   0.8%   2.8KiB    fluent_syntax fluent_syntax::parser::get_inline_expression
 0.5%   0.8%   2.7KiB        [Unknown] _read_function_entry
 0.4%   0.8%   2.5KiB              std hashbrown::raw::RawTable<T>::reserve_rehash
 0.4%   0.8%   2.5KiB              std hashbrown::raw::RawTable<T>::reserve_rehash
39.8%  71.9% 239.4KiB                  And 1055 smaller methods. Use -n N to show more.
55.4% 100.0% 333.1KiB                  .text section size, the file size is 601.7KiB

Naive investigation using cargo-bloat doesn't show too much or I fail to read it properly.

@raphlinus - any thoughts on this? I know you brought up rental, but I don't see it here. How do I fine tune the params to show it in the cargo bloat data?

raphlinus commented 5 years ago

My sense (not backed up by hard data) is that rental affects mostly compile time. I expect it not to bloat the binary much, as I think it's mostly about proving the validity of lifetimes of borrowed resources. I believe this will largely be true for proc-macros as well. My sense is that both compile time and binary size are worth optimizing; in some cases (monomorphization) the same root cause affects both, for expensive proc macros and stuff like phf the story will be a bit different.

Thanks for looking into this, I appreciate it!

cmyr commented 5 years ago

My current understanding:

removing the langid! macro from fluent-bundle saves about 30kb on a debug build, and very little on a release build.

syn doesn't impact code size, only compile time.

zbraniecki commented 5 years ago

Interesting. I'm also invested in reducing the build times, but in this issue I wanted to focus on reducing the resulting bundle size, as I believe to be the more important of the two for the foundational crate.

Am I correct to interpret from your comments @cmyr and @ralphinus that there doesn't seem to be any low hanging fruit for reducing the impact of fluent-bundle on the lib size?

cmyr commented 5 years ago

here's the output of cargo bloat --example=hello --release --crates in druid:

23.9%  43.0% 197.2KiB std
10.3%  18.6%  85.1KiB druid
 5.3%   9.6%  44.0KiB fluent_syntax
 4.1%   7.3%  33.6KiB [Unknown]
 2.9%   5.3%  24.3KiB druid_shell
 2.0%   3.6%  16.4KiB backtrace_sys
 1.9%   3.5%  16.1KiB unic_langid_impl
 1.7%   3.1%  14.0KiB intl_pluralrules
 0.9%   1.6%   7.4KiB objc
 0.6%   1.1%   5.1KiB fluent_bundle
 0.4%   0.8%   3.6KiB cairo
 0.4%   0.7%   3.2KiB piet_cairo
 0.4%   0.7%   3.1KiB fluent_locale
 0.3%   0.6%   2.6KiB hello
 0.1%   0.2%     944B piet
 0.1%   0.2%     800B kurbo
 0.1%   0.1%     640B proc_macro2
 0.0%   0.0%      16B malloc_buf
zbraniecki commented 5 years ago

From fluent land I see fluent_syntax which is mainly a parser. I'm not sure if there's a good way to reduce its size. unic_langid_impl is surprisingly large - larger than intl_pluralrules and pluralrules store pluralization rules for all locales, while unic_langid stores no data. Is it because of macro?

fluent-bundle at 5kb seems reasonable I hope, and fluent_locale at 3.1kb also seems fairly reasonable.

Does it seems similar to you?

cmyr commented 5 years ago

yes, I agree with that. This has gotten me a bit curious so I'm going to dig in a bit after dinner and do some experimenting, will let you know what I find.

cmyr commented 5 years ago

The output of two runs of cargo bloat --example=calc --release -n 40 in druid, before and after the localization PR:

 File  .text     Size       Crate Name
 1.9%   3.9%  10.5KiB   [Unknown] _read_line_info
 1.9%   3.8%  10.2KiB        calc calc::main
 1.3%   2.8%   7.4KiB         std ___rdos_backtrace_dwarf_add
 1.1%   2.2%   6.0KiB   [Unknown] __mh_execute_header
 1.0%   2.0%   5.2KiB         std std::sys_common::backtrace::_print
 0.9%   1.8%   4.8KiB   [Unknown] _read_attribute
 0.8%   1.7%   4.7KiB         std rustc_demangle::demangle
 0.8%   1.7%   4.6KiB         std core::num::flt2dec::strategy::dragon::format_shortest
 0.8%   1.7%   4.5KiB         std <rustc_demangle::Demangle as core::fmt::Display>::fmt
 0.7%   1.4%   3.6KiB         std core::num::flt2dec::strategy::dragon::format_exact
 0.5%   1.0%   2.7KiB   [Unknown] _read_function_entry
 0.4%   0.9%   2.3KiB        calc calc::CalcState::op
 0.4%   0.8%   2.3KiB         std core::num::flt2dec::strategy::grisu::format_shortest_opt
 0.4%   0.8%   2.2KiB         std core::str::pattern::StrSearcher::new
 0.4%   0.8%   2.2KiB         std ___rust_probestack
 0.4%   0.8%   2.0KiB   [Unknown] _add_function_ranges
 0.4%   0.7%   2.0KiB   [Unknown] _add_unit_ranges
 0.3%   0.7%   1.9KiB         std ___rdos_backtrace_qsort
 0.3%   0.7%   1.8KiB        calc calc::flex_row
 0.3%   0.7%   1.8KiB   [Unknown] _find_address_ranges
 0.3%   0.7%   1.8KiB         std _macho_try_dwarf
 0.3%   0.6%   1.7KiB druid_shell <druid_shell::keyboard::KeyCode as core::convert::From<u16>>:...
 0.3%   0.6%   1.7KiB       druid hashbrown::raw::RawTable<T>::reserve_rehash
 0.3%   0.6%   1.6KiB   [Unknown] _dwarf_lookup_pc
 0.3%   0.6%   1.5KiB         std core::num::dec2flt::dec2flt
 0.3%   0.6%   1.5KiB       druid druid::WidgetPod<T,W>::event
 0.3%   0.5%   1.4KiB        objc objc::declare::method_type_encoding
 0.3%   0.5%   1.4KiB       druid std::collections::hash::map::HashMap<K,V,S>::insert
 0.3%   0.5%   1.4KiB         std <str as core::fmt::Debug>::fmt
 0.3%   0.5%   1.4KiB         std core::num::flt2dec::strategy::grisu::format_exact_opt
 0.3%   0.5%   1.4KiB         std core::num::dec2flt::algorithm::algorithm_m
 0.2%   0.5%   1.3KiB druid_shell druid_shell::mac::WindowBuilder::build
 0.2%   0.5%   1.3KiB druid_shell druid_shell::mac::menu::strip_access_key
 0.2%   0.5%   1.2KiB         std core::fmt::float::float_to_decimal_common_exact
 0.2%   0.5%   1.2KiB         std std::sync::once::Once::call_once::{{closure}}
 0.2%   0.5%   1.2KiB         std core::fmt::float::float_to_decimal_common_exact
 0.2%   0.4%   1.2KiB         std _macho_add_symtab
 0.2%   0.4%   1.2KiB         std <std::path::Components as core::iter::traits::iterator::Itera...
 0.2%   0.4%   1.2KiB       druid <druid::widget::flex::Flex<T> as druid::Widget<T>>::layout
 0.2%   0.4%   1.1KiB         std core::fmt::float::float_to_decimal_common_shortest
28.7%  58.8% 157.6KiB             And 982 smaller methods. Use -n N to show more.
48.8% 100.0% 267.9KiB             .text section size, the file size is 548.6KiB
 File  .text     Size            Crate Name
 1.3%   2.2%  10.6KiB             calc calc::main
 1.2%   2.2%  10.5KiB        [Unknown] _read_line_info
 1.0%   1.9%   8.8KiB    fluent_syntax fluent_syntax::parser::parse
 1.0%   1.8%   8.4KiB            druid druid::localization::L10nManager::new
 0.9%   1.6%   7.4KiB    backtrace_sys ___rdos_backtrace_dwarf_add
 0.6%   1.1%   5.2KiB              std std::sys_common::backtrace::_print
 0.6%   1.0%   4.8KiB    fluent_syntax fluent_syntax::parser::get_placeable
 0.6%   1.0%   4.8KiB        [Unknown] _read_attribute
 0.6%   1.0%   4.7KiB              std rustc_demangle::demangle
 0.5%   1.0%   4.6KiB              std core::num::flt2dec::strategy::dragon::format_shortest
 0.5%   0.9%   4.5KiB              std <rustc_demangle::Demangle as core::fmt::Display>::fmt
 0.5%   0.9%   4.5KiB   fluent_syntax? <fluent_syntax::ast::InlineExpression as fluent_bundle:...
 0.4%   0.8%   3.6KiB              std core::num::flt2dec::strategy::dragon::format_exact
 0.4%   0.8%   3.6KiB    fluent_syntax fluent_syntax::parser::get_call_arguments
 0.4%   0.7%   3.1KiB        [Unknown] __mh_execute_header
 0.4%   0.6%   3.0KiB unic_langid_impl alloc::slice::merge_sort
 0.4%   0.6%   3.0KiB            druid druid::localization::LocalizedString<T>::resolve
 0.3%   0.6%   2.9KiB              std ___rust_probestack
 0.3%   0.6%   2.8KiB            druid fluent_locale::negotiate::negotiate_languages
 0.3%   0.6%   2.8KiB            druid fluent_locale::negotiate::negotiate_languages
 0.3%   0.6%   2.8KiB    fluent_syntax fluent_syntax::parser::get_inline_expression
 0.3%   0.6%   2.7KiB        [Unknown] _read_function_entry
 0.3%   0.6%   2.7KiB            druid fluent_bundle::bundle::FluentBundle<R>::add_resource
 0.3%   0.5%   2.6KiB            druid hashbrown::raw::RawTable<T>::reserve_rehash
 0.3%   0.5%   2.5KiB            druid hashbrown::raw::RawTable<T>::reserve_rehash
 0.3%   0.5%   2.5KiB            druid hashbrown::raw::RawTable<T>::reserve_rehash
 0.3%   0.5%   2.4KiB    fluent_syntax fluent_syntax::parser::get_pattern
 0.3%   0.5%   2.4KiB            druid hashbrown::raw::RawTable<T>::reserve_rehash
 0.3%   0.5%   2.4KiB            druid fluent_bundle::resolve::get_arguments
 0.3%   0.5%   2.3KiB              std alloc::str::<impl str>::replace
 0.3%   0.5%   2.3KiB             calc calc::CalcState::op
 0.3%   0.5%   2.3KiB              std core::num::flt2dec::strategy::grisu::format_shortest_opt
 0.3%   0.5%   2.3KiB    fluent_locale fluent_locale::negotiate::likely_subtags::add
 0.3%   0.5%   2.3KiB              std hashbrown::raw::RawTable<T>::reserve_rehash
 0.3%   0.5%   2.3KiB              std hashbrown::raw::RawTable<T>::reserve_rehash
 0.3%   0.5%   2.2KiB              std core::str::pattern::StrSearcher::new
 0.2%   0.4%   2.1KiB            druid hashbrown::raw::RawTable<T>::reserve_rehash
 0.2%   0.4%   2.1KiB unic_langid_impl unic_langid_impl::parser::parse_language_identifier
 0.2%   0.4%   2.0KiB        [Unknown] _add_function_ranges
 0.2%   0.4%   2.0KiB        [Unknown] _add_unit_ranges
38.2%  68.2% 322.8KiB                  And 1589 smaller methods. Use -n N to show more.
55.9% 100.0% 473.4KiB                  .text section size, the file size is 846.2KiB

A few things stand out:

raphlinus commented 5 years ago

The reserve_rehash calls are from FluentBundle<R>::add_resource and negotiate_languages (at least for the handful I looked at). The methodology I use is to add --full-fn to the bloat cmdline, then otool -tvV to look at the disassembly.

Taking a quick look at the code, I agree that making negotiate_languages less polymorphic is worth exploring.

lambda-fairy commented 5 years ago

Can I take on making negotiate_languages less polymorphic?

I also had a quick look at the implementation and have an idea for how to reduce the bloat.

zbraniecki commented 5 years ago

Can I take on making negotiate_languages less polymorphic?

I'm open to it. My main design goal was to make it applicable over Locale and LanguageIdentifier. My hope was that people will be able to construct their own ways of representing LanguageIdentifier that may contain more data, but as long as they implement LanguageIdentifier (I thought of making it a trait!) they can negotiate between lists of them.

I'm open to better way to express it!

lambda-fairy commented 5 years ago

Great!

I've pushed up two branches:

@raphlinus or @cmyr would you like to give these branches a go? I'm on Linux so I can't build Druid myself.

I'm open to better way to express it!

I actually wasn't planning on changing the API :smile:

But now you mention it, I wonder if this flexibility is worth the effort. I searched GitHub for uses of negotiate_languages and all of them seemed to instantiate the function at LanguageIdentifier. So this polymorphism appears to go unused.

In particular, with the wrapper branch, I felt like much of the wrapper code could be eliminated if the function took two Vec<LanguageIdentifier> instead.

lambda-fairy commented 5 years ago

From fluent land I see fluent_syntax which is mainly a parser. I'm not sure if there's a good way to reduce its size.

Have you considered putting the error type in a Box? I don't know how much that would affect code size, but it would help reduce stack copying by making the Result smaller. Serde boxes its error type for this reason.

cmyr commented 5 years ago

@lfairy playing around with these:

master: 850.5KiB remove_hashmap: 841.7KiB wrapper: 842.6KiB

Something that is maybe interesting is that the .text section decreases in size with wrapper, from 471.1 to 466.2, but the overall binary size is increasing.

@zbraniecki good to hear your rationale for these function signatures; as anecdata I definitely found them a little tricky sometimes, like when I have a Vec<LanguageIdentifier> and I end up allocating a new Vec<&LanguageIdentifier> just for negotiation.

zbraniecki commented 5 years ago

But now you mention it, I wonder if this flexibility is worth the effort. I searched GitHub for uses of negotiate_languages and all of them seemed to instantiate the function at LanguageIdentifier. So this polymorphism appears to go unused.

Yes, but that's because LanguageIdentifier is the only thing mature enough to use it, and even that is very young. I expect more projects to start switching to Locale over time.

@zbraniecki good to hear your rationale for these function signatures; as anecdata I definitely found them a little tricky sometimes, like when I have a Vec and I end up allocating a new Vec<&LanguageIdentifier> just for negotiation.

Yeah, I also did hit that. When thinking about it more, I realized that "awkward" happened mostly in tests, where langid strings come from hardcoded places, but not really in production where all locale lists come either from users (requested) or from some environment data (available).

But as I said, I'd love to find a better solution :)

lambda-fairy commented 5 years ago

Thanks @cmyr. I'll post a PR for the hash map change then.

I'd still love to see how much of a difference changing the parser error type makes.

zbraniecki commented 4 years ago

@cmyr - I landed a change in fluent-locale-rs which switches away from HashMap to use Vec.

The perf impact is really nice:

negotiate               time:   [2.1180 us 2.1443 us 2.1764 us]
                        change: [-49.479% -46.126% -43.364%] (p = 0.00 < 0.05)
                        Performance has improved.

Can you review the impact on build size?

cmyr commented 4 years ago

@zbraniecki is this published to crates.io?

zbraniecki commented 4 years ago

no, you'd have to point cargo to the master of https://github.com/projectfluent/fluent-locale-rs - I'll release it in a couple days after testing that it works well with FFI for Gecko use case.

cmyr commented 4 years ago

Unfortunately I can't get that to build; my code also uses fluent-bundle and it requires an incompatible version of unic-langid. :(

zbraniecki commented 4 years ago

@cmyr - can you try again? All crates are pushed to crates.io and should be compatible.

cmyr commented 4 years ago

This must have happened a while ago? In any case moving to vec shows a modest decrease in binary size, although this is very unscientific, a lot of other stuff has also changed in druid in the meantime; on the hello example fluent_syntax is down from 44k to 41.5k.

zbraniecki commented 4 years ago

Thank you!

Is this further actionable as we know right now? I'd like to close that issue if not, while we continue working on keeping the crate light.

cmyr commented 4 years ago

I don't have any concrete actionable suggestions, so I think you're fine to close this. Thanks!