theislab / destiny

R package for single cell and other data analysis using diffusion maps
https://theislab.github.io/destiny/
GNU General Public License v3.0
69 stars 12 forks source link

Use better ANN library #18

Open flying-sheep opened 5 years ago

flying-sheep commented 5 years ago

Instead of using my built in cover-tree approximate nearest neighbor lib, new options have popped up:

NMSLIB HNSWLIB Annoy FALCONN
sparse matrix support yes no no* yes
R bindings through Python good good* no
distances numerous Euclidean, Squared L2, Inner product, Cosine** Angular, Euclidean, Manhattan, Hamming, Inner product + custom Cosine

*Annoy is super flexible, but neither the Python nor R bindings support sparse matrices. I think sparse matrix support might be custom added though. ** I think you can extend HNSWLIB with more distances, but it’s not as easy as doing the same with Annoy

AmberLJC commented 5 years ago

Does FALCONN support sparse data in sparse matrix format ? Or it just do the optimization on sparse data in dense format?

flying-sheep commented 5 years ago

According to How to use FALCONN, it can be used with dense or sparse vectors. The sparse ones have to be std::vector< std::pair< int32_t/int64_t, float/double > > however, so you can’t just give it column/row-compressed sparse matrix data.

Thinking about it, that makes it little better than the others: The user has to copy their whole data to match the format FALCONN understands.

flying-sheep commented 4 years ago

Huh, I missed that Aaron lun created BiocNeighbors! I have to investigate this :D

Yunuuuu commented 6 months ago

Hope to integrate BiocNeighbors

gdagstn commented 5 months ago

Hi @flying-sheep, findKmknn() and findVptree() from BiocNeighbors provide exact KNN with euclidean and cosine metrics, and work out of the box with sparse matrices. I have successfully (as far as vignette and test builds go) implemented them in my fork, can do a PR if it works for you.