This commit got kind of out of hand and in https://github.com/jedisct1/rust-bloom-filter/issues/7 you specifically mentioned minimising invasiveness so I wanted to pause here to see which bits you're interested in before I chop it up into more acceptable commits. I'm aware that this commit can't be merged as-is but I wanted to get the back-and-forth started :)
This does:
splits Bloom's members into BloomState and BloomHasher, so that BloomHasher can be separately cloned to compute offsets in separate threads
Splits set into make_offsets/set_offsets (and ditto for get) so that those offsets built-up elsewhere can be used. (I think offsets here is a dumb name and I've love a better one if you have one :) )
I believe this is the only place with a performance change: prior to this get was able to avoid computing all of the offsets if the item was found with an earlier one. Now it always computes them all. If you feel this is a problem it can be avoided with a tiny bit of code duplication
get/set now take references instead of consuming their arguments.
This is an API change. I incremented the minor version for this but I expect it's probably better to increment the major version instead
Makes Bloom into Bloom<T> so it's not possible to accidentally mix types between get/set. This came up for me when using it, finding that because a T and a &T do Hash differently, which makes sense. But it means that improperly adding/forgetting a & somewhere made my program incorrect (rather than simply not compile which I'd generally prefer).
This is a bigger API change :) It's is probably mostly source-compatible unless the code using it previously was incorrect
Adds a method to get a reference to the raw BitVec. This came up for me in wanting to minimise the number of copies for very large filters when trying to write serialised filters to a file
I'm totally willing to drop any of those things and of course I'll rearrange the commits so that the history is more reasonable
This commit got kind of out of hand and in https://github.com/jedisct1/rust-bloom-filter/issues/7 you specifically mentioned minimising invasiveness so I wanted to pause here to see which bits you're interested in before I chop it up into more acceptable commits. I'm aware that this commit can't be merged as-is but I wanted to get the back-and-forth started :)
This does:
Bloom
's members intoBloomState
andBloomHasher
, so thatBloomHasher
can be separately cloned to compute offsets in separate threadsset
intomake_offsets
/set_offsets
(and ditto forget
) so that those offsets built-up elsewhere can be used. (I thinkoffsets
here is a dumb name and I've love a better one if you have one :) )get
was able to avoid computing all of the offsets if the item was found with an earlier one. Now it always computes them all. If you feel this is a problem it can be avoided with a tiny bit of code duplicationget
/set
now take references instead of consuming their arguments.Bloom
intoBloom<T>
so it's not possible to accidentally mix types betweenget
/set
. This came up for me when using it, finding that because aT
and a&T
doHash
differently, which makes sense. But it means that improperly adding/forgetting a&
somewhere made my program incorrect (rather than simply not compile which I'd generally prefer).I'm totally willing to drop any of those things and of course I'll rearrange the commits so that the history is more reasonable