Closed julsal closed 7 years ago
The current test set doesn't catch potential casting problems. It was necessary to test with a real large file (>2GB), which was not included in PR because of its size.
would be nice to add unit tests for this
@erikbern Yes, it would be nice to have better unit tests, but I didn't find a suitable way to test large indexes without a real large file. Regardless, making NODE_SIZE long type, makes all casts mandatory (otherwise, it does not compile).
@a1k0n, is everything ok to merge and publish a new maven version? :) By the way, we are using effectively this version and it is running gratefully so far.
Yes, looks good to me. But I'm not the maintainer anymore. I guess this version is somewhat abandoned since @yonromai moved on to a JNI version.
@erikbern or @yonromai could you merge this fix?
there you go
@Yuanrui1994 I read about annoy's algorithm and found out that it is supported by random projections. @erikbern Could you tell us if random projections make annoy's output non-deterministic?
yes non deterministic unless you set the seed explicitly
On Jul 24, 2017 1:48 PM, Juliano Efson Sales notifications@github.com wrote:
@Yuanrui1994https://github.com/yuanrui1994 I read about annoy's algorithm and found out that it is supported by random projections. @erikbernhttps://github.com/erikbern Could you tell us if random projections make annoy's output non-deterministic?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/spotify/annoy-java/pull/11#issuecomment-317549716, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AUTtrmNtq87duK8sIXbSgJ9v8gQGVP7Xks5sRQMIgaJpZM4OUCLc.
Hi,
Now I am confused. I read the annoy-java code and it seems to be determinstic. Why is it non determinstic and which part is non determinstic?
I might not be clear about my question. I first created a index tree using annoy-python and then loaded the index tree and did some searching using both old version annoy-java and the latest version. And I got different results.
Thanks for your explanation!
Yuanrui
Querying should be deterministic. Index building is random. So you may be encountering a bug.
1- removing bugs related to cast (int -> long | long -> int); 2- optimizing the number of MappedByteBuffer.