Open mizvyt opened 5 years ago
I'd avoid changing defaults without bumping to a major version, since it can break persistent filters for existing users.
Curiously, xxhash
is licensed under BSD 2-Clause
. I don't know if it lets us release under current MIT
license. Do you have any idea?
Regarding new xxhash, I noticed that it is also available as a on pypi -- and I'm wondering if it's possible to add it as a dependency instead. What do you think?
Also wondering if there's any real performance benefit with it -- did you see any benchmark?
Curiously,
xxhash
is licensed underBSD 2-Clause
. I don't know if it lets us release under currentMIT
license. Do you have any idea?
Shouldn't be an issue. It's similar to MIT license, so it doesn't implicate anything to this library, only prerequisite is to contain a copyright notice for xxHash
, which is already customized and prepended to each file.
Also wondering if there's any real performance benefit with it -- did you see any benchmark? Benchmarks for xxHash are here for some reference systems: https://github.com/Cyan4973/xxHash/blob/dev/README.md
I'm assuming since this library uses the same interface and procedure for each hashing algorithm, there shouldn't be an issue to test it and we should see an improvement, albeit I haven't created any benchmarks for it...
I'd avoid changing defaults without bumping to a major version, since it can break persistent filters for existing users.
You're right. I think a better way would be to use it as a Python package extension (along with the other two hashing algorithms), such that when we want to install pybloomfilter
with xxHash, we do pip install pybloomfilter[xxhash]
.
Regarding new xxhash, I noticed that it is also available as a on pypi -- and I'm wondering if it's possible to add it as a dependency instead. What do you think?
I think it's just a Python wrapper for the library, rather than the actual library? In the repo, they use the xxHash repo as a submodule: https://github.com/ifduyue/python-xxhash/tree/master/deps
I pulled the xxHash code for the latest release (0.7.2), but I agree it would be better to have some dependency management. Of course, same goes for other hashing algorithms, but let's keep it out of scope for now.
I'll stash this PR for now? Will keep it in my branch and rebase with other changes, and give a thought on a more elegant solution.
I haven't created any benchmarks for it...
Been there, done that! I think it's a nice hashing method from what I saw in the code (Once upon a time I ported city-hash
in Python (link) so it'd be nice to have "pluggable" hash methods).
Regarding submodule: It usually complicates stuff (eg. if you forget --recursive flag while cloning). Given that it's not particularly huge code, I think copying it here is better, since we can also adapt them if required.
Let's keep it open and once you're sure it doesn't break things from your branch, we can work to get new release out. Sounds good?
Been there, done that! I think it's a nice hashing method from what I saw in the code (Once upon a time I ported
city-hash
in Python (link) so it'd be nice to have "pluggable" hash methods).Let's keep it open and once you're sure it doesn't break things from your branch, we can work to get new release out. Sounds good?
Definitely!
How confident are we that this is correct ? xD
Requested here a while ago:
Currently the fastest hashing algorithm. Added as default, but can be changed if needed.
@prashnts