Closed eyvindn closed 7 years ago
simhash-py has only been tested in 64-bit environments. If you're compiling it for 32-bit, you're on your own for help.
I'm not compiling in a 32-bit environment, I investigated the issue further, and setup.py simply compiles a universal binary by default, causing some warnings during compilation.
However, forcing it to only compile for x86_64 doesn't help (obviously, as it was using the 64-bit implementation in the universal binary by default when I was running simhash).
Essentially the issue I'm having is that hese things don't work:
corpus = simhash.Corpus(6,3) corpus.insert(1) corpus.hashes()
Will return [0L] on every table in the corpus, regardless of what you first insert.
Also when you insert more into the corpus, more things definitely get added, but calling corpus.hashes()
Always returns the wrong hashes (besides the inital 0L, I believe they are the permuted versions of the hashes). Initially I thought this made sense, but upon further investigation it's pulling these from table 0, which should be the initial permutation of the hash (aka the untouched hash), so it doesn't really make sense.
The issue actually lies in simhash-cpp, namely the const_iterator is wrongly implemented and does not properly loop through the Judy array. I'll submit a fix.
Could you please confirm that this has been fixed by pulling in recent updates from simhash-cpp in #19?
I'm going to assume this was resolved with the prior PR since there's been no update on this ticket in a long while.
For some reason, find/find_all works well works, but calling hashes() on a corpus results in completely wrong hashes being returned.
Running OSX 10.10.3 with Python 2.7.6, just cloned from the repo and followed the instructions.
Could it have something to do with 64 vs 32bit when compiling, which I am getting errors about?
During compile I get a lot of these errors:
warning: implicit conversion loses integer precision: 'uint64_t' (aka 'unsigned long long') to 'Simhash::hash_t' (aka 'unsigned long') [-Wshorten-64-to-32]