mgalardini / pyseer

SEER, reimplemented in python 🐍🔮
http://pyseer.readthedocs.io
Apache License 2.0
104 stars 25 forks source link

fsm-lite stops due to lack of RAM #259

Open hfaoro opened 7 months ago

hfaoro commented 7 months ago

Hello, I am trying to perform a k-mer analysis using pysser for 6,240 bacterial genomes with binary phenotype. I'm having trouble running fsm-lite. The kmer count is using a lot of RAM and the process does not finish.

I use a Slurm-based cluster to submit jobs, so I need to declare how much memory and cpu my process will use. On the last attempt, I used 500 GB of RAM and it wasn't enough. Any suggestions to overcome this problem? Would increasing the number of cpus help?

mgalardini commented 7 months ago

That is a large number of strains indeed! I can suggest trying unitig-counter or bifrost, which should better equipped to handle large datasets

hfaoro commented 7 months ago

Yes, I managed to run the analysis using unitig-caller, but the result of mapping the unitgs to the reference genome using phandango_mapper was strange. Only 80 unitgs were mapped. In another analysis using a smaller data set, kmer's results were very positive, so I was trying to use the same strategy for this larger set.