Open kalafut opened 9 months ago
I suggest again that SampleCount==0
would force load the full file into the hasher.
I agree with xxhash because of the small data case imohash will trigger.
- (internal) remove the now-defunct testing library being used.
Would you consider doing this before releasing v1.1? I'm willing to submit a pull request if that's ok with you.
I'm working on packaging croc on Debian and would need to do some workarounds to avoid is.v1
.
@guilherme-puida Yes, thanks for letting me know and I'll prioritize that soon. I think I already eliminated it in my dev branch and will release an update with those changes.
Nice! Thanks for the quick response.
Even if you don't make a new tagged release, I could still just import the patch and remove it later down the road when you release v1.1. But tagging a new version would certainly be nice as well :^)
@guilherme-puida v1.0.3 has now been pushed. lmk if you run into any packaging issues.
Wow! That was quick. Thanks!
Sure, I'll ping you if I run into any trouble, but I don't expect to have to.
Cheers!
I'm scoping a V1.1 release that will add the first new features since the initial release in 2015. The intent is to maintain backward compatibility, so this will not need to be a major (V2) release from a Go module perspective.
Planned Changes
User-defined sample count via a new
SampleCount
parameter. This will let users override the current fixed number of sample chunks (3). The samples will continue to be evenly spaced across the files. This could improve conflict detection in large files with file change properties better caught by incorporating data from all parts of the file. This feature was prompted by a discussion in the py-imohash project.SampleCount==3
will retain the current behavior for backward compatibility. (This will be a special case because the general case of how to space out n samples is slightly different than current behavior.SampleCount==2
will sample at the beginning and end.SampleCount==1
will sample at the beginning.The core hashing algorithm will be configurable and (probably) xxhash will be added as an alternative to murmur3. xxhash is faster and may (?) have fewer weaknesses. That said, for most uses the default murmur3 hash is still fine.
Optional size mixing. The current behavior encodes the size into the hash by prefixing it. Having the size recoverable and/or many hashes with a similar prefix may not be desirable. A new option to mix size information into the hash will be added.
Adopt functional options with the
New()
function. This will allow for these and future enhancements without changing the defaultNew()
signature.NewCustom()
will be deprecated.(internal) remove the now-defunct testing library being used. (completed in 1.0.3)