RcppHNSW is a wrapper for R around the HNSW c++ library. It replaces the first step of FastPG so that the KNN index is built with the RcppHNSW library instead of the existing nmslibR wrapper that required Python. Testing results look good against the gold standard data (Levine13). Results for 5 iterations sampling 80,000 cells:
I have tried to update all the documentation files as well with the exception of the Docker materials. Those may need closer review and checking to make sure the RcppHNSW dependency is properly included in the fork.
RcppHNSW is a wrapper for R around the HNSW c++ library. It replaces the first step of FastPG so that the KNN index is built with the RcppHNSW library instead of the existing nmslibR wrapper that required Python. Testing results look good against the gold standard data (Levine13). Results for 5 iterations sampling 80,000 cells:
[1] "80000" Precision: 0.919297971212903 Recall: 0.8797375
Precision: 0.924845246896492 Recall: 0.890525
Precision: 0.925546942611531 Recall: 0.931975
Precision: 0.916563951809679 Recall: 0.896375
Precision: 0.917757197750322 Recall: 0.871525
On a 10-core Intel machine with 64GB of memory, clustering 1.1 million cells (datamatrix_LungCancer_multiATOM_N1113369.txt from https://data.mendeley.com/datasets/nnbfwjvmvw/draft?a=dae895d4-25cd-4bdf-b3e4-57dd31c11e37) takes 1.5 minutes. Oversampling to 10 million cells from that dataset on the same machine took 11.2 minutes.
I have tried to update all the documentation files as well with the exception of the Docker materials. Those may need closer review and checking to make sure the RcppHNSW dependency is properly included in the fork.