jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
383 stars 81 forks source link

Is it possible to run coassembly mode with 960 metagenomics samples? #910

Open weishwu opened 5 days ago

weishwu commented 5 days ago

I have 960 metagenomics sequencing samples and each one has >30million read pairs. Is it possible to use coassembly mode with 1T memory? My initial try failed at the first step (read concatenation). The [project_name]/data/raw_fastq/ folder is empty. README says the merged or seqmerge mode will be very slow with large sample number, so I didn't try them. If sequential mode is my only choice how should I do comparison in the downstream? Thanks.

jtamames commented 4 days ago

Hello I never tried myself such a big project, I cannot say. You can try to do your coassembly externally and provide the result to SqueezeMeta using the extassembly option. And of course you can analyze the metagenomes in sequential mode and combine the results afterwards in sqmtools. You can also combine and dereplicate the bins for each metagenome using dRep or similar. In my experience, this can provide bins that are rarer but are present in a few samples. Best, J

weishwu commented 3 days ago

Thanks @jtamames for your answers! I'm running squeezemeta on each sample individually so that I can parallel the jobs.