xiaoyifang / goldendict-ng

The Next Generation GoldenDict
https://xiaoyifang.github.io/goldendict-ng/
Other
1.71k stars 95 forks source link

benchmark + opt: make Folding::apply slightly faster #1945

Closed shenlebantongying closed 1 week ago

shenlebantongying commented 1 week ago

related https://github.com/xiaoyifang/goldendict-ng/issues/1943

The original code reallocates memory for string multiple times, just reuse Qt's case folding and do everything with QString, only returns std::u32string.

It is simply 10x faster.

New version is apply2

In debug build, the speed up is 10x

/Users/slbtty/src/goldendict-ng/cmake-build-debug/bcf
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
***WARNING*** Failed to set thread affinity. Estimated CPU frequency may be incorrect.
2024-11-12T17:56:24-05:00
Running /Users/slbtty/src/goldendict-ng/cmake-build-debug/bcf
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 3.55, 3.72, 3.48
--------------------------------------------------------
Benchmark              Time             CPU   Iterations
--------------------------------------------------------
applyFolding       45889 ns        45767 ns        15304
applyFolding2       3609 ns         3602 ns       194045

For -O2 or similar, the factor is less than 2 https://github.com/xiaoyifang/goldendict-ng/actions/runs/11807278152/job/32893638497#step:7:15

shenlebantongying commented 1 week ago

Is there any reason why we shouldn't use Qt's case folding, they use the same data file but newer https://github.com/qt/qtbase/blob/dev/util/unicode/data/CaseFolding.txt