tdunning / knn

Large scale k-nn experiments
http://mahout.mapr.com
68 stars 21 forks source link

UpdatableSearcher cannot update reference vectors on the fly #8

Open openwzdh opened 11 years ago

openwzdh commented 11 years ago

When trying to add new reference vectors into the searcher that is doing searches, ConcurrentLinkedDeque is a thread safe alternative to the ArrayList.

java.util.ConcurrentModificationException: null at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:819) ~[na:1.7.0_09] at java.util.ArrayList$Itr.next(ArrayList.java:791) ~[na:1.7.0_09] at org.apache.mahout.knn.search.FastProjectionSearch.reindex(FastProjectionSearch.java:180) ~[knn-0.1-SNAPSHOT.jar:na] at org.apache.mahout.knn.search.FastProjectionSearch.search(FastProjectionSearch.java:111) ~[knn-0.1-SNAPSHOT.jar:na]

dfilimon commented 11 years ago

Hi there!

Currently we're not using concurrent data structures to keep the overhead as low as possible. There's a thread about this on the Mahout mailing list [1] about this very issue.

But you're certainly welcome to modify the code to get it what you want to do. :)

Also, please track my branch [2]. It's a bit more up to date and is where the work is happening.

[1] http://mail-archives.apache.org/mod_mbox/mahout-user/201212.mbox/%3CCALzSx%2BzOMYBod%3DspWgrsf4Cenqzv%3DnSnsALUP%3DRt%3DXQe6e6SVQ%40mail.gmail.com%3E [2] https://github.com/dfilimon/knn

openwzdh commented 11 years ago

Thank you for your suggestions, Filimon! Concurrent updating the index is more difficult to implement than expected, we temporarily settled down on a workaround method. When samples are added, a background thread builds a new searcher to replace the old one. It costs but it works. We will continue to develop the concurrent version and benchmark it.