jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
363 stars 78 forks source link

sqm_reads.pl appears to be stuck while running #866

Closed mycomira closed 1 week ago

mycomira commented 1 month ago

Hello SqueezeMeta folks,

I'm attempting to run sqm_reads.pl on a dataset containing 20 samples generated using Illumina 2x100 read sequencing. The file sizes are rather large (~2GB to ~15GB). When running the script for an extended period of time (~7 days) I'm unable to produce results; the script doesn't even finish running on the first sample before timing out, even when requesting 180GB memory on an HPC. When running sqm_reads.pl on a smaller dataset (27MB) I am able to run the script. I'm currently attempting a 14-day run.

Do you have recommendations for optimizing memory for a run such as this? Or is it possible that I'm somehow getting "stuck" while running this script? I have not made any modifications to the original sqm_reads.pl file.

Thanks for your help!

jtamames commented 1 month ago

Hello sqm_reads is a rather slow task. It is not that affected by memory usage, simply tboth query and databases are VERY big.

Possible solutions: 1) Split your data (queries), and run then separately, ideally in different cores simultaneously. Do this externally, using the command:

diamond blastx -q query -p numthreads -d nr_db -e 1e-03 --quiet -f tab -o outfile

Replacing "query" by your files, "numthreads" by the number of threads, and "nr_db" by the location of your nr database (you can find this in the conf file). Then cat all resulting files, move the result to the project directory, and then run sqm_reads.pl again with the --nodiamond option 2) Reduce the database. You can use the reduce_db.pl script (in utils) for this. For instance, select just prokaryotes, or exclude non-interesting taxa, etc. You can do any combination of including and excluding taxa. Then run sqm_reads using the reduced database.

Hope it helps. Best, J

mycomira commented 1 month ago

Thank you! I'll give this a try and report back

fpusan commented 1 week ago

Closing due to lack of activity, feel free to reopen!