Open flying-sheep opened 5 years ago
Does FALCONN support sparse data in sparse matrix format ? Or it just do the optimization on sparse data in dense format?
According to How to use FALCONN, it can be used with dense or sparse vectors. The sparse ones have to be std::vector< std::pair< int32_t/int64_t, float/double > >
however, so you can’t just give it column/row-compressed sparse matrix data.
Thinking about it, that makes it little better than the others: The user has to copy their whole data to match the format FALCONN understands.
Huh, I missed that Aaron lun created BiocNeighbors! I have to investigate this :D
Hope to integrate BiocNeighbors
Hi @flying-sheep,
findKmknn()
and findVptree()
from BiocNeighbors
provide exact KNN with euclidean and cosine metrics, and work out of the box with sparse matrices. I have successfully (as far as vignette and test builds go) implemented them in my fork, can do a PR if it works for you.
Instead of using my built in cover-tree approximate nearest neighbor lib, new options have popped up:
*Annoy is super flexible, but neither the Python nor R bindings support sparse matrices. I think sparse matrix support might be custom added though. ** I think you can extend HNSWLIB with more distances, but it’s not as easy as doing the same with Annoy