s-yata / marisa-trie

MARISA: Matching Algorithm with Recursively Implemented StorAge
Other
506 stars 89 forks source link

BitVector::build_index: 100x speedup #28

Closed jmr closed 4 years ago

jmr commented 4 years ago

Process the BitVector unit-by-unit instead of bit-by-bit.

Use PopCount::count() to update num_1s and use select_bit to find the bit positions for the select0s_ and select1s_ indexes.

According to my benchmarks, the old bit-by-bit version processed a 256kbit vector at about 20MB/s independent of enables_select0 and enables_select1.

The new version is 50x-150x faster, depending on the compiler and build_index options.

enables_select_0=enables_select1=false: popcnt, no bmi2: 1600MB/s popcnt, no bmi2: 2500MB/s popcnt and bmi2: 2900MB/s

enables_select_0=enables_select1=true: no popcnt, no bmi2: 1100MB/s popcnt, no bmi2: 1600MB/s popcnt and bmi2: 1800MB/s

s-yata commented 4 years ago

Benchmark

The following table shows build speed [1,000 keys/second].

tries| s-yata:master [K/s]| jmr:build-index [K/s]

-----:|-------------------:|---------------------: 1 | 1,054.64| 1,087.85 2 | 915.17| 937.07 3 | 901.00| 920.59 4 | 896.08| 914.57 5 | 894.30| 912.47

jmr:build-index is 2-3% faster than s-yata:master.

jmr commented 4 years ago

jmr:build-index is 2-3% faster than s-yata:master.

Did you configure with --enable-native-code? popcnt and select_bit are going to be important.

My benchmark was just on BitVector::build_index. I don't know what fraction of marisa-benchmark is spent in build_index, so I can't say whether more than 2-3% is expected.

I will have time to run/profile the benchmarks myself later in the week.

s-yata commented 4 years ago

The table shows the speed of dictionary construction and BitVector::build_index is not a major part of it. However, I think the improvement is enough to accept this pull request.

s-yata commented 4 years ago

It looks good tome. Thank you!