Closed jamesturk closed 1 year ago
After the latest round of improvments to the Rust versions (mostly switching to SmallVec) most Rust versions are around 1.5-2x the speed of the C versions. Damerau is the exception at 4-5x, the use of HashMap the likely culprit. (The C version used a custom Trie)
This is fine for 0.11, since the safety/unicode tradeoff here is huge and its still a lot faster than Python. Will still probably explore a Trie to improve Damerau.
Going to let 0.11 sit for a while to shake out any packaging issues. Once there's been a chance for people to test, I'll release 1.0.
Originally the library used the c implementation if available and did fall back to a pure Python version if it was not. This has the advantage, that on platforms where no wheels are available + no c compiler was installed it would still work (albeit at a lower performance). To my understanding this behavior was completely dropped in version 0.11.0
. On these platforms it is even more unlikely for a rust compiler to be preinstalled.
An example of this is:
podman run -it alpine
>>> apk add --update --no-cache python3 && ln -sf python3 /usr/bin/python
>>> python3 -m ensurepip
# fails since it can not build the package
>>> python3 -m pip install jellyfish
# installs the pure Python fallback
>>> python3 -m pip install jellyfish==0.10.0
Is this an oversight, or are breaking changes like this considered fine in minor versions of jellyfish
-> people should pin minor package versions
The plan is to produce wheels for all major platforms, I currently do plan to remove the Python implementations but might reconsider that if there are platforms it is hard to provide prebuilt binaries for.
I just pushed 0.11.2 which has a small speedup as well as changes to the build process that should fix installation on alpine.
For anyone interested, given issues like #184 I think I'll restore the automatic fallback option for now.
It's time to leave C behind.