Closed sffc closed 4 years ago
The rust numbers are lower at face value, but the ICU numbers are run based on a list of 956 locales. So should the ICU numbers be divided by 956? Basically, how do we compare apples to apples?
Rust are also run against 956 language tags on input, see: https://github.com/zbraniecki/intl-measurements/blob/master/unic/locale/src/lib.rs used in https://github.com/zbraniecki/intl-measurements/blob/master/unic/locale/benches/langid.rs#L12-L14
So, from what I can claim, the numbers are comparable, and on your machine:
526 us
in ICU vs. 29.295 us
in unic-langid to parse 956 language identifiers.18 us
in ICU vs 1.4252 us
in unic-langid to match 956 against en-US
and find 91 matches.2291 us
in ICU vs 65.909 us
to serialize 956 them back to strings3557 us
in ICU vs 66.772 us
to add likely subtags to 956 locales229376 bytes
in ICU vs. 30592 bytes
in unic-langid.I think it warrants some consideration of what Rust can bring to the table.
I'm still testing pluralrules, and I have an initial mock of datetime (https://github.com/zbraniecki/unic-datetime) which is also where I'm experimenting with baked in data vs. loading from JSON CLDR. In both cases I'm seeing similar perf wins over ICU.
I see, so for example, the code
b.iter(|| {
for loc in &locales {
let _ = black_box(loc).to_string();
}
})
returns an iterator, but the iterator still loops over all locales each time, so the per-iteration speed still includes all 956 langauge tags.
I also like that I saw some comments in the docs for the Locale class that you're thinking about binary size. It would be great if we can continue to keep binary size tight in the new classes you're working on, such that we can target WebAssembly. Might be something to consider adding tests/benchmarks for if that's possible. (Note: I've been experimenting with various ways to deflate Rust binaries and have found some success; would be glad to work with you on that when we get closer to that point.)
I duplicated the single-run binary from C++ to Rust in:
I hope it makes it easier to validate the performance claims.
I ran your code and here are the results from my Linux workstation. Note: I am running ICU 65.
ICU:
Rust:
The rust numbers are lower at face value, but the ICU numbers are run based on a list of 956 locales. So should the ICU numbers be divided by 956? Basically, how do we compare apples to apples?