Metacells analysis gets killed in divide_and_conquer_pipeline

tanaylab / metacells

Metacells - Single-cell RNA Sequencing Analysis

MIT License

86 stars 8 forks source link

Metacells analysis gets killed in divide_and_conquer_pipeline #31

Closed aschlicker closed 1 year ago

aschlicker commented 1 year ago

Hi,

I'm trying to run metacells on a dataset with ~500k cells. I've set up an Ubuntu machine in AWS (Ubuntu 20.04.5 LTS, 96 cores, 756 Gb, Python 3.9, metacells 0.8). After a couple of hours, I get an unceremonious "Killed" message on the command line while running divide_and_conquer_pipeline. There's nothing else running on this machine.

I tried limiting the number of processes and cores used

mc.pl.set_max_parallel_piles(5)
mc.ut.set_processors_count(10)

Given the size of the machine, I don't think that I'm running into resource limitations. To get more detailed logging, I added the two environment varibles METACELLS_COLLECT_TIMING and METACELLS_LOG_ALL_STEPS. I attached the full log file here. Any idea on how to solve this?

Thanks, andi error_detail.txt.gz

orenbenkiki commented 1 year ago

Thanks for the detailed report!

The failure is in the rare genes detection (pre-processing). This is a known problematic step for data that isn't "straight up normal gene UMIs" - seeing you have ~60K "genes" I'm guessing these aren't really "well known genes" as such.

Not that this should matter, of course. The next version to be released protects against this problem by having an additional parameter rare_max_genes, but as this isn't published yet, it doesn't help you much (unless you want to manually install the head version from github).

For now, the simplest thing you can do is skip this step completely, that is, invoke compute_divide_and_conquer_metacells instead of divide_and_conquer_pipeline. You can lift the restriction on the parallel piles - it never got that far so it probably won't be an issue.

Let me know if that helped.

orenbenkiki commented 1 year ago

FWIW - I have updated the rare gene detection implementation in the about-to-be-released version 0.9 to be more efficient and consume less memory. That might help. Note that version 0.9 is different from version 0.8 (see the project README). It is in final testing phases, currently you can only install it from the head version in github.

aschlicker commented 1 year ago

Thanks for the update. Highly appreciated. I already tried version 0.9 back in October. Just updated and will run it on a large 1M cell dataset in a couple of weeks. It worked very well on my small dataset.

orenbenkiki commented 1 year ago

Version 0.9 is now published so closing this as done.