Benchmarks on my Linux workstation

sffc commented 5 years ago

I ran your code and here are the results from my Linux workstation. Note: I am running ICU 65.

ICU:

Create Locale from str for 956 locales. time: 526 us
Number of matches against en-US: 91. time: 18 us
Total size of the locales vector: 229376 bytes.
Serialized Locale. time: 2291 us
Added/Removed likely subtags. time: 3557 us

Rust:

    Finished release [optimized] target(s) in 0.04s
     Running target/release/deps/locale-d1d2580a27727da5

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/release/deps/locale-764e810170acc09d

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/release/deps/langid-fc55181b62f69024
language_identifier_parsing                                                                             
                        time:   [29.073 us 29.295 us 29.611 us]
                        change: [+2.6716% +3.8077% +5.5243%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

language_identifier_matches                                                                             
                        time:   [1.4225 us 1.4252 us 1.4285 us]
                        change: [-1.1112% -0.6328% -0.0423%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

language_identifier_serialize                                                                            
                        time:   [65.623 us 65.909 us 66.182 us]
                        change: [-1.5245% -1.1212% -0.7151%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

language_identifier_add_likely_subtags                                                                            
                        time:   [66.677 us 66.772 us 66.906 us]
                        change: [-1.0278% -0.7892% -0.5750%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

     Running target/release/deps/locale-c5483b7a92fe7eff
locale_parsing          time:   [95.719 us 95.904 us 96.119 us]                           
                        change: [+5.8257% +6.2829% +6.7837%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

The rust numbers are lower at face value, but the ICU numbers are run based on a list of 956 locales. So should the ICU numbers be divided by 956? Basically, how do we compare apples to apples?

zbraniecki commented 5 years ago

The rust numbers are lower at face value, but the ICU numbers are run based on a list of 956 locales. So should the ICU numbers be divided by 956? Basically, how do we compare apples to apples?

Rust are also run against 956 language tags on input, see: https://github.com/zbraniecki/intl-measurements/blob/master/unic/locale/src/lib.rs used in https://github.com/zbraniecki/intl-measurements/blob/master/unic/locale/benches/langid.rs#L12-L14

So, from what I can claim, the numbers are comparable, and on your machine:

It takes 526 us in ICU vs. 29.295 us in unic-langid to parse 956 language identifiers.
It takes 18 us in ICU vs 1.4252 us in unic-langid to match 956 against en-US and find 91 matches.
It takes 2291 us in ICU vs 65.909 us to serialize 956 them back to strings
It takes 3557 us in ICU vs 66.772 us to add likely subtags to 956 locales
And if you store those 956 parsed structs in memory, it will take 229376 bytes in ICU vs. 30592 bytes in unic-langid.

I think it warrants some consideration of what Rust can bring to the table.

zbraniecki commented 5 years ago

I'm still testing pluralrules, and I have an initial mock of datetime (https://github.com/zbraniecki/unic-datetime) which is also where I'm experimenting with baked in data vs. loading from JSON CLDR. In both cases I'm seeing similar perf wins over ICU.

sffc commented 5 years ago

I see, so for example, the code

        b.iter(|| {
            for loc in &locales {
                let _ = black_box(loc).to_string();
            }
        })

returns an iterator, but the iterator still loops over all locales each time, so the per-iteration speed still includes all 956 langauge tags.

I also like that I saw some comments in the docs for the Locale class that you're thinking about binary size. It would be great if we can continue to keep binary size tight in the new classes you're working on, such that we can target WebAssembly. Might be something to consider adding tests/benchmarks for if that's possible. (Note: I've been experimenting with various ways to deflate Rust binaries and have found some success; would be glad to work with you on that when we get closer to that point.)

zbraniecki commented 4 years ago

I duplicated the single-run binary from C++ to Rust in:

I hope it makes it easier to validate the performance claims.

zbraniecki / intl-measurements

Benchmarks on my Linux workstation #2