spotify / annoy-java

Approximate nearest neighbors in Java
Apache License 2.0
138 stars 46 forks source link

Throw exception if init with wrong dimension #8

Closed 5antelope closed 7 years ago

5antelope commented 7 years ago

Problem: Try to fix #7

Solution: When we call load() at in constructor, if (file_size % dimension) != 0, it indicates there is a mismatch between index file and passed in dimension

erikbern commented 7 years ago

is this going to work? i think you need to check that it's modulo the size of a vector, not just the number of dimensions

also note that this can have false negatives

5antelope commented 7 years ago

Thanks @erikbern

I added a unit test for it -- ANNIndexTest#testLoadFile: throw an exception when index dimension does not match with dimension in ANNIndex() constructor.

How about we check (file_size % NODE_SIZE) == 0?

I know that it will still introduce false negatives, but ideally the only way to fix that would be adding metadata in the header? The sanity checks here would help to reduce the chances of mistake :-)

5antelope commented 7 years ago

Sorry, I misread the comment.

Add a check in ANNIndex#getNearest() to make sure the vector passed in should always be same size as DIMENSION;

Also, I think the check in load(), although introduces some false positive, would be nice to have as well?

5antelope commented 7 years ago

Bump -- can I have a quick review on the change? Thanks! :-)

5antelope commented 7 years ago

Hi @erikbern, does the change look reasonable to you? Thanks!

erikbern commented 7 years ago

looks like you need to rebase!

erikbern commented 7 years ago

also i defer to @a1k0n to determine correctness

5antelope commented 7 years ago

Hi @erikbern @a1k0n -- I resolved the conflicts and adjust logic based on 2GB change:

it looks for two cases:

Let me know if the change makes sense to you.

Thanks!

5antelope commented 7 years ago

@a1k0n mind taking a look while you are free? Thanks! 😄

5antelope commented 7 years ago

@erikbern ready to land 😎