Closed zbraniecki closed 4 years ago
Zibis-MacBook-Pro:fluent-bundle (master)$ cargo bloat --release --crates --example simple-app
Compiling ...
Analyzing target/release/examples/simple-app
File .text Size Crate
32.0% 57.8% 192.7KiB std
7.5% 13.6% 45.3KiB fluent_syntax
5.3% 9.6% 31.8KiB [Unknown]
2.8% 5.1% 16.9KiB fluent_bundle
2.6% 4.6% 15.4KiB unic_langid_impl
2.4% 4.3% 14.4KiB intl_pluralrules
1.4% 2.5% 8.3KiB simple_app
1.3% 2.4% 8.0KiB fluent_locale
0.1% 0.1% 352B smallvec
55.4% 100.0% 333.1KiB .text section size, the file size is 601.7KiB
Note: numbers above are a result of guesswork. They are not 100% correct and never will be.
Zibis-MacBook-Pro:fluent-bundle (master)$ cargo bloat --release -n 20 --example simple-app
Compiling ...
Analyzing target/release/examples/simple-app
File .text Size Crate Name
1.7% 3.2% 10.5KiB [Unknown] _read_line_info
1.5% 2.6% 8.8KiB fluent_syntax fluent_syntax::parser::parse
1.4% 2.5% 8.3KiB simple_app simple_app::main
1.2% 2.2% 7.4KiB std ___rdos_backtrace_dwarf_add
0.8% 1.5% 4.9KiB fluent_locale fluent_locale::negotiate::negotiate_languages
0.8% 1.5% 4.9KiB std rustc_demangle::demangle
0.8% 1.5% 4.9KiB std <rustc_demangle::legacy::Demangle as core::fmt::Display>::fmt
0.8% 1.4% 4.8KiB fluent_syntax fluent_syntax::parser::get_placeable
0.8% 1.4% 4.8KiB [Unknown] _read_attribute
0.7% 1.3% 4.4KiB fluent_syntax? <fluent_syntax::ast::InlineExpression as fluent_bundle::resolve::ResolveValue>::resolve
0.6% 1.1% 3.6KiB std std::sys_common::backtrace::Printer::output
0.6% 1.1% 3.6KiB fluent_syntax fluent_syntax::parser::get_call_arguments
0.6% 1.0% 3.4KiB [Unknown] __mh_execute_header
0.5% 0.9% 3.0KiB unic_langid_impl alloc::slice::merge_sort
0.5% 0.9% 2.9KiB std rustc_demangle::v0::Printer::print_type
0.5% 0.8% 2.8KiB fluent_bundle fluent_bundle::bundle::FluentBundle<R>::add_resource
0.5% 0.8% 2.8KiB fluent_syntax fluent_syntax::parser::get_inline_expression
0.5% 0.8% 2.7KiB [Unknown] _read_function_entry
0.4% 0.8% 2.5KiB std hashbrown::raw::RawTable<T>::reserve_rehash
0.4% 0.8% 2.5KiB std hashbrown::raw::RawTable<T>::reserve_rehash
39.8% 71.9% 239.4KiB And 1055 smaller methods. Use -n N to show more.
55.4% 100.0% 333.1KiB .text section size, the file size is 601.7KiB
Naive investigation using cargo-bloat
doesn't show too much or I fail to read it properly.
@raphlinus - any thoughts on this? I know you brought up rental
, but I don't see it here. How do I fine tune the params to show it in the cargo bloat data?
My sense (not backed up by hard data) is that rental affects mostly compile time. I expect it not to bloat the binary much, as I think it's mostly about proving the validity of lifetimes of borrowed resources. I believe this will largely be true for proc-macros as well. My sense is that both compile time and binary size are worth optimizing; in some cases (monomorphization) the same root cause affects both, for expensive proc macros and stuff like phf the story will be a bit different.
Thanks for looking into this, I appreciate it!
My current understanding:
removing the langid!
macro from fluent-bundle
saves about 30kb on a debug build, and very little on a release build.
syn
doesn't impact code size, only compile time.
Interesting. I'm also invested in reducing the build times, but in this issue I wanted to focus on reducing the resulting bundle size, as I believe to be the more important of the two for the foundational crate.
Am I correct to interpret from your comments @cmyr and @ralphinus that there doesn't seem to be any low hanging fruit for reducing the impact of fluent-bundle
on the lib size?
here's the output of cargo bloat --example=hello --release --crates
in druid:
23.9% 43.0% 197.2KiB std
10.3% 18.6% 85.1KiB druid
5.3% 9.6% 44.0KiB fluent_syntax
4.1% 7.3% 33.6KiB [Unknown]
2.9% 5.3% 24.3KiB druid_shell
2.0% 3.6% 16.4KiB backtrace_sys
1.9% 3.5% 16.1KiB unic_langid_impl
1.7% 3.1% 14.0KiB intl_pluralrules
0.9% 1.6% 7.4KiB objc
0.6% 1.1% 5.1KiB fluent_bundle
0.4% 0.8% 3.6KiB cairo
0.4% 0.7% 3.2KiB piet_cairo
0.4% 0.7% 3.1KiB fluent_locale
0.3% 0.6% 2.6KiB hello
0.1% 0.2% 944B piet
0.1% 0.2% 800B kurbo
0.1% 0.1% 640B proc_macro2
0.0% 0.0% 16B malloc_buf
From fluent land I see fluent_syntax
which is mainly a parser. I'm not sure if there's a good way to reduce its size.
unic_langid_impl
is surprisingly large - larger than intl_pluralrules
and pluralrules store pluralization rules for all locales, while unic_langid stores no data. Is it because of macro?
fluent-bundle at 5kb seems reasonable I hope, and fluent_locale at 3.1kb also seems fairly reasonable.
Does it seems similar to you?
yes, I agree with that. This has gotten me a bit curious so I'm going to dig in a bit after dinner and do some experimenting, will let you know what I find.
The output of two runs of cargo bloat --example=calc --release -n 40
in druid, before and after the localization PR:
File .text Size Crate Name
1.9% 3.9% 10.5KiB [Unknown] _read_line_info
1.9% 3.8% 10.2KiB calc calc::main
1.3% 2.8% 7.4KiB std ___rdos_backtrace_dwarf_add
1.1% 2.2% 6.0KiB [Unknown] __mh_execute_header
1.0% 2.0% 5.2KiB std std::sys_common::backtrace::_print
0.9% 1.8% 4.8KiB [Unknown] _read_attribute
0.8% 1.7% 4.7KiB std rustc_demangle::demangle
0.8% 1.7% 4.6KiB std core::num::flt2dec::strategy::dragon::format_shortest
0.8% 1.7% 4.5KiB std <rustc_demangle::Demangle as core::fmt::Display>::fmt
0.7% 1.4% 3.6KiB std core::num::flt2dec::strategy::dragon::format_exact
0.5% 1.0% 2.7KiB [Unknown] _read_function_entry
0.4% 0.9% 2.3KiB calc calc::CalcState::op
0.4% 0.8% 2.3KiB std core::num::flt2dec::strategy::grisu::format_shortest_opt
0.4% 0.8% 2.2KiB std core::str::pattern::StrSearcher::new
0.4% 0.8% 2.2KiB std ___rust_probestack
0.4% 0.8% 2.0KiB [Unknown] _add_function_ranges
0.4% 0.7% 2.0KiB [Unknown] _add_unit_ranges
0.3% 0.7% 1.9KiB std ___rdos_backtrace_qsort
0.3% 0.7% 1.8KiB calc calc::flex_row
0.3% 0.7% 1.8KiB [Unknown] _find_address_ranges
0.3% 0.7% 1.8KiB std _macho_try_dwarf
0.3% 0.6% 1.7KiB druid_shell <druid_shell::keyboard::KeyCode as core::convert::From<u16>>:...
0.3% 0.6% 1.7KiB druid hashbrown::raw::RawTable<T>::reserve_rehash
0.3% 0.6% 1.6KiB [Unknown] _dwarf_lookup_pc
0.3% 0.6% 1.5KiB std core::num::dec2flt::dec2flt
0.3% 0.6% 1.5KiB druid druid::WidgetPod<T,W>::event
0.3% 0.5% 1.4KiB objc objc::declare::method_type_encoding
0.3% 0.5% 1.4KiB druid std::collections::hash::map::HashMap<K,V,S>::insert
0.3% 0.5% 1.4KiB std <str as core::fmt::Debug>::fmt
0.3% 0.5% 1.4KiB std core::num::flt2dec::strategy::grisu::format_exact_opt
0.3% 0.5% 1.4KiB std core::num::dec2flt::algorithm::algorithm_m
0.2% 0.5% 1.3KiB druid_shell druid_shell::mac::WindowBuilder::build
0.2% 0.5% 1.3KiB druid_shell druid_shell::mac::menu::strip_access_key
0.2% 0.5% 1.2KiB std core::fmt::float::float_to_decimal_common_exact
0.2% 0.5% 1.2KiB std std::sync::once::Once::call_once::{{closure}}
0.2% 0.5% 1.2KiB std core::fmt::float::float_to_decimal_common_exact
0.2% 0.4% 1.2KiB std _macho_add_symtab
0.2% 0.4% 1.2KiB std <std::path::Components as core::iter::traits::iterator::Itera...
0.2% 0.4% 1.2KiB druid <druid::widget::flex::Flex<T> as druid::Widget<T>>::layout
0.2% 0.4% 1.1KiB std core::fmt::float::float_to_decimal_common_shortest
28.7% 58.8% 157.6KiB And 982 smaller methods. Use -n N to show more.
48.8% 100.0% 267.9KiB .text section size, the file size is 548.6KiB
File .text Size Crate Name
1.3% 2.2% 10.6KiB calc calc::main
1.2% 2.2% 10.5KiB [Unknown] _read_line_info
1.0% 1.9% 8.8KiB fluent_syntax fluent_syntax::parser::parse
1.0% 1.8% 8.4KiB druid druid::localization::L10nManager::new
0.9% 1.6% 7.4KiB backtrace_sys ___rdos_backtrace_dwarf_add
0.6% 1.1% 5.2KiB std std::sys_common::backtrace::_print
0.6% 1.0% 4.8KiB fluent_syntax fluent_syntax::parser::get_placeable
0.6% 1.0% 4.8KiB [Unknown] _read_attribute
0.6% 1.0% 4.7KiB std rustc_demangle::demangle
0.5% 1.0% 4.6KiB std core::num::flt2dec::strategy::dragon::format_shortest
0.5% 0.9% 4.5KiB std <rustc_demangle::Demangle as core::fmt::Display>::fmt
0.5% 0.9% 4.5KiB fluent_syntax? <fluent_syntax::ast::InlineExpression as fluent_bundle:...
0.4% 0.8% 3.6KiB std core::num::flt2dec::strategy::dragon::format_exact
0.4% 0.8% 3.6KiB fluent_syntax fluent_syntax::parser::get_call_arguments
0.4% 0.7% 3.1KiB [Unknown] __mh_execute_header
0.4% 0.6% 3.0KiB unic_langid_impl alloc::slice::merge_sort
0.4% 0.6% 3.0KiB druid druid::localization::LocalizedString<T>::resolve
0.3% 0.6% 2.9KiB std ___rust_probestack
0.3% 0.6% 2.8KiB druid fluent_locale::negotiate::negotiate_languages
0.3% 0.6% 2.8KiB druid fluent_locale::negotiate::negotiate_languages
0.3% 0.6% 2.8KiB fluent_syntax fluent_syntax::parser::get_inline_expression
0.3% 0.6% 2.7KiB [Unknown] _read_function_entry
0.3% 0.6% 2.7KiB druid fluent_bundle::bundle::FluentBundle<R>::add_resource
0.3% 0.5% 2.6KiB druid hashbrown::raw::RawTable<T>::reserve_rehash
0.3% 0.5% 2.5KiB druid hashbrown::raw::RawTable<T>::reserve_rehash
0.3% 0.5% 2.5KiB druid hashbrown::raw::RawTable<T>::reserve_rehash
0.3% 0.5% 2.4KiB fluent_syntax fluent_syntax::parser::get_pattern
0.3% 0.5% 2.4KiB druid hashbrown::raw::RawTable<T>::reserve_rehash
0.3% 0.5% 2.4KiB druid fluent_bundle::resolve::get_arguments
0.3% 0.5% 2.3KiB std alloc::str::<impl str>::replace
0.3% 0.5% 2.3KiB calc calc::CalcState::op
0.3% 0.5% 2.3KiB std core::num::flt2dec::strategy::grisu::format_shortest_opt
0.3% 0.5% 2.3KiB fluent_locale fluent_locale::negotiate::likely_subtags::add
0.3% 0.5% 2.3KiB std hashbrown::raw::RawTable<T>::reserve_rehash
0.3% 0.5% 2.3KiB std hashbrown::raw::RawTable<T>::reserve_rehash
0.3% 0.5% 2.2KiB std core::str::pattern::StrSearcher::new
0.2% 0.4% 2.1KiB druid hashbrown::raw::RawTable<T>::reserve_rehash
0.2% 0.4% 2.1KiB unic_langid_impl unic_langid_impl::parser::parse_language_identifier
0.2% 0.4% 2.0KiB [Unknown] _add_function_ranges
0.2% 0.4% 2.0KiB [Unknown] _add_unit_ranges
38.2% 68.2% 322.8KiB And 1589 smaller methods. Use -n N to show more.
55.9% 100.0% 473.4KiB .text section size, the file size is 846.2KiB
A few things stand out:
we have tons of versions of reserve_rehash
. I think this is in fluent somewhere; there's some new use of HashMap in my code, but if I replace that with BTreeMap I see the same results from bloat.
negotiate_languages
is polymorphic on a number of axes, and generates a lot of code. it shows up twice here.
The reserve_rehash
calls are from FluentBundle<R>::add_resource
and negotiate_languages
(at least for the handful I looked at). The methodology I use is to add --full-fn
to the bloat cmdline, then otool -tvV
to look at the disassembly.
Taking a quick look at the code, I agree that making negotiate_languages
less polymorphic is worth exploring.
Can I take on making negotiate_languages
less polymorphic?
I also had a quick look at the implementation and have an idea for how to reduce the bloat.
Can I take on making negotiate_languages less polymorphic?
I'm open to it. My main design goal was to make it applicable over Locale and LanguageIdentifier. My hope was that people will be able to construct their own ways of representing LanguageIdentifier that may contain more data, but as long as they implement LanguageIdentifier (I thought of making it a trait!) they can negotiate between lists of them.
I'm open to better way to express it!
Great!
I've pushed up two branches:
remove-hashmap
, as its name implies, removes the HashMap
from the implementation. Nice easy win.wrapper
makes the core algorithm monomorphic by tracking the available locale objects in an extra Vec
. The API is not changed. It also includes the changes in remove-hashmap
.@raphlinus or @cmyr would you like to give these branches a go? I'm on Linux so I can't build Druid myself.
I'm open to better way to express it!
I actually wasn't planning on changing the API :smile:
But now you mention it, I wonder if this flexibility is worth the effort. I searched GitHub for uses of negotiate_languages
and all of them seemed to instantiate the function at LanguageIdentifier
. So this polymorphism appears to go unused.
In particular, with the wrapper
branch, I felt like much of the wrapper code could be eliminated if the function took two Vec<LanguageIdentifier>
instead.
From fluent land I see
fluent_syntax
which is mainly a parser. I'm not sure if there's a good way to reduce its size.
Have you considered putting the error type in a Box
? I don't know how much that would affect code size, but it would help reduce stack copying by making the Result
smaller. Serde boxes its error type for this reason.
@lfairy playing around with these:
master
: 850.5KiB
remove_hashmap
: 841.7KiB
wrapper
: 842.6KiB
Something that is maybe interesting is that the .text
section decreases in size with wrapper
, from 471.1 to 466.2, but the overall binary size is increasing.
@zbraniecki good to hear your rationale for these function signatures; as anecdata I definitely found them a little tricky sometimes, like when I have a Vec<LanguageIdentifier>
and I end up allocating a new Vec<&LanguageIdentifier>
just for negotiation.
But now you mention it, I wonder if this flexibility is worth the effort. I searched GitHub for uses of negotiate_languages and all of them seemed to instantiate the function at LanguageIdentifier. So this polymorphism appears to go unused.
Yes, but that's because LanguageIdentifier is the only thing mature enough to use it, and even that is very young. I expect more projects to start switching to Locale
over time.
@zbraniecki good to hear your rationale for these function signatures; as anecdata I definitely found them a little tricky sometimes, like when I have a Vec
and I end up allocating a new Vec<&LanguageIdentifier> just for negotiation.
Yeah, I also did hit that. When thinking about it more, I realized that "awkward" happened mostly in tests, where langid strings come from hardcoded places, but not really in production where all locale lists come either from users (requested) or from some environment data (available).
But as I said, I'd love to find a better solution :)
Thanks @cmyr. I'll post a PR for the hash map change then.
I'd still love to see how much of a difference changing the parser error type makes.
@cmyr - I landed a change in fluent-locale-rs
which switches away from HashMap
to use Vec
.
The perf impact is really nice:
negotiate time: [2.1180 us 2.1443 us 2.1764 us]
change: [-49.479% -46.126% -43.364%] (p = 0.00 < 0.05)
Performance has improved.
Can you review the impact on build size?
@zbraniecki is this published to crates.io?
no, you'd have to point cargo to the master of https://github.com/projectfluent/fluent-locale-rs - I'll release it in a couple days after testing that it works well with FFI for Gecko use case.
Unfortunately I can't get that to build; my code also uses fluent-bundle and it requires an incompatible version of unic-langid
. :(
@cmyr - can you try again? All crates are pushed to crates.io and should be compatible.
This must have happened a while ago? In any case moving to vec shows a modest decrease in binary size, although this is very unscientific, a lot of other stuff has also changed in druid in the meantime; on the hello
example fluent_syntax
is down from 44k
to 41.5k
.
Thank you!
Is this further actionable as we know right now? I'd like to close that issue if not, while we continue working on keeping the crate light.
I don't have any concrete actionable suggestions, so I think you're fine to close this. Thanks!
We should try to make
fluent-bundle
release size as small as possible.@cmyr pointed out in https://github.com/zbraniecki/unic-locale/issues/10 that at the moment the size increase is ~300kb.
I'd like to look for ways to make it smaller.
According to the analysis the bulk of the size increase goes into
rental
andsyn
(used for langid macro).I think that first low hanging fruit would be to stop using the macro in
fluent-bundle
, and investigate therental
.