stephenslab / fastTopics

Fast algorithms for fitting topic models and non-negative matrix factorizations to count data.
https://stephenslab.github.io/fastTopics
Other
74 stars 7 forks source link

Feature request: multi-threaded DE analysis #35

Open aksarkar opened 1 year ago

aksarkar commented 1 year ago

Testing for differential expression/accessibility is embarrassingly parallel, and could use all available cores by default.

pcarbo commented 1 year ago

@aksarkar I believe this is already implemented with the nc control argument in de_analysis; see "Details" in help(de_analysis).

aksarkar commented 1 year ago

I see. I missed this in the documentation since I was expecting it to be in the function arguments, not in control.

However, when I tried it on my data set (~180K x 100K scATAC-seq count matrix), the process appears to only use one core and hang.

pcarbo commented 1 year ago

From what I recall, one issue is that the parallel implementation is pretty memory-hungry (I tried to fix this, but did not have any luck). These were my settings for a scRNA-seq data set with ~90,000 cells. You might try first with a small number of samples, say ns = 1000. What is K here? My guess is that your run only used one core because it didn't hung before it got to the multithreading part.

aksarkar commented 1 year ago

I think the issue is that dat is copied to every thread, and the memory usage could be drastically reduced if pblapply is taken over cols instead.

https://github.com/stephenslab/fastTopics/blob/master/R/lfc.R#L107-L121

pcarbo commented 1 year ago

Memory usage is an issue with de_analysis, as you correctly point out, but I think the issue is more fundamentally with mclapply (which is used by pblapply). Previously, I was using parLapply, which avoids these issues (and has the advantage that it is more platform-independent), but I ran into other unexpected issues with parLapply, so I ended up switching to mclapply. Certainly, some improvements here are warranted and I'm open to suggestions.

pcarbo commented 1 year ago

See this gist.

aksarkar commented 1 year ago

@pcarbo After digging into #37 and the details of mclapply, I think the memory usage issue is fundamentally because mclapply forks subprocesses, which copies everything in memory to every process.

There is still room to improve the total memory usage, although it's a bit difficult to profile (probably would require something like docker stats).

pcarbo commented 1 year ago

There is still room to improve the total memory usage.

I agree 100%.

hlszlaszlo commented 8 months ago

Thank you for the great package! I am also failing to run DE on a larger dataset (~31000 x 35000) matrix. I am using the 'better-multithread' brench. Even on HPC cluster its still not passing 0%. Is there any workaround to run it?

pcarbo commented 8 months ago

@hlszlaszlo Is your matrix a sparse matrix ("dgCMatrix")? Could you please share your exact de_analysis() call?

hlszlaszlo commented 8 months ago

Was running as a regular marix class.

Now running with counts as "dgCMarix" class and seems much faster, thank you 🙂

Command: de_analysis(fit, counts, pseudocount = 0.1, control = list(ns = 1e4, nc = 25))

pcarbo commented 8 months ago

I'm glad to hear that helped.

When you are using nc = 25, also make sure that you have requested at least 25 cores in your job.

hlszlaszlo commented 8 months ago

Yes, and I see using 25 cores. The sparse matrix helped. I can even run with less core on personal computer.

pcarbo commented 7 months ago

Bumping up the priority of this issue based on some recent conversations with researchers trying to run de_analysis on a large data set.