greyblake / whatlang-rs

Natural language detection library for Rust. Try demo online: https://whatlang.org/
https://whatlang.org/
MIT License
969 stars 109 forks source link

change `detect_script` to do less full traversals of `text` #2

Closed Dr-Emann closed 7 years ago

Dr-Emann commented 7 years ago

Also change usages of &String to &str

greyblake commented 7 years ago

@Dr-Emann thanks for the pull request.

I also thought about this approach. I am quite new to Rust (and system programming in generally), so I don't know exactly how big is the cost of iterating through a string.

I'd like to do some benchmarks :)

greyblake commented 7 years ago

I did benchmarks against 1000 chars of different langs/scripts. Here is the summary:

Initial implementation

Was optimized for Latin (5,500-8,500 ns). For other languages/scripts it was > 75,000 ns. (I don't know how to explain such a big gap :/).

Improved implementation

For all languages (English, Russian, Arabic) shows same result in range of 16,000-18,000ns

greyblake commented 7 years ago

@Dr-Emann Thanks!

greyblake commented 7 years ago

@Dr-Emann Just discovered, that ignoring irrelevant chars make it the function work ~ 2.5 times faster:) https://github.com/greyblake/whatlang-rs/commit/4e3311e440d5f5e046f0b2724f6bee6b4a29f126#diff-420c68e7483ee630e16762977da06437R50