kootenpv / textsearch

Find strings/words in text; convenience and C speed :fireworks:
126 stars 17 forks source link

Replace Unidecode with anyascii #3

Closed sajal2692 closed 3 years ago

sajal2692 commented 3 years ago

This PR should solve issue #1.

The actual replacement of Unidecode usages with anyascii was straightforward. I manually ran the tests and no problems there.

I was not able to get tox and pyenv to behave as required on my machine. I also don't know how we check for execution speed and if there are any changes in it with the usage of anyascii.

It would be great if you can look into this @kootenpv

Thanks!

kootenpv commented 3 years ago

Did a very naive benchmark and it seems okay! If anything, 3x faster... so thanks for the PR.

from anyascii import anyascii
from unidecode import unidecode

def fn1():
    anyascii("ö")
    anyascii("ä")

def fn2():
    unidecode("ä")
    unidecode("ö")

In [14]: %timeit fn1()
1.15 µs ± 4.15 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [15]: %timeit fn2()
3.04 µs ± 21.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [16]: %timeit fn2()
3.15 µs ± 39.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [17]: %timeit fn1()
1.18 µs ± 15 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)