sararselitsky / FastPG

Fast phenograph, CyTOF
Other
25 stars 6 forks source link

Clustering using Leiden Algorithm #8

Closed mdmanurung closed 3 years ago

mdmanurung commented 4 years ago

Dear authors,

Thank you for writing this wonderful package. I could cluster ~16 million cells in less than an hour! Do you know of a way to use Leiden clustering instead of Louvain? I tried to convert the edge list (from dedup_links()) to adjacency matrix in R. However, I am not sure of the most efficient way to do it in R as it would be quite prohibitive to initialize a matrix of 16 million x 16 million.

Thank you for your time.

Best regards, Mikhael

sararselitsky commented 4 years ago

Sorry for the delayed response! I was out of town. Is there a Leiden package that takes in an edge list instead of a matrix? You are right, a 16M x 16M matrix is too large. I'll look around and see what I can find.

tom-b commented 4 years ago

Hi. I think you might want to export the edge list to a file and then try Leiden in Python by using the leidenalg package. I know in the original Leiden paper (https://www.nature.com/articles/s41598-019-41695-z), the Leiden authors tested some graphs with up to 39 million nodes. Their test environment had 1 TB of physical memory and I believe that memory is the limiting factor in the implementation.