jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
357 stars 78 forks source link

minimum contig length can affect memory requirements? #618

Closed Lafmas closed 1 year ago

Lafmas commented 1 year ago

Hi,

I'm having trouble analyzing 11 metagenome samples (10gb per sample) in coassembly mode with Megahit.

Despite having 512GB of memory, the process keeps getting cancelled with exit code -9.

Since it seems to be a memory issue, I tried to analyze the samples in merged mode, but the process took too long in Minimus2 and had to be stopped after more than 14 days.

My question is: can I lower the memory requirements by adjusting the minimum contig length? I'm currently using the default value of 200, but I plan to try setting it to 1000.

If this reduces the memory requirements, I'll attempt the coassembly mode analysis again.

Thanks, Lafmas

Lafmas commented 1 year ago

Hi again,

I have one more question. If adjusting the minimum contig length doesn't affect the memory requirements, I'm considering using the minimum contig length of 1000 and either the merged or seqmerged mode.

Is 11 samples a reasonable number for merged mode? If not, would seqmerged mode be a better option?

Overall, if increasing the minimum contig length to 1000 reduces memory requirements, I'll use the coassembly mode. Otherwise, I'll use merged or seqmerged mode.

Thanks, Lafmas

fpusan commented 1 year ago

Adjusting the contig length will not reduce the memory requirements for assembly. When you say "11 Gb per sample" you mean the size of the fastq.gz files? Or the actual number of input gigabases per sample? You can also try to add --assembly_options "--presets meta-large" when calling SqueezeMeta. Maybe it helps

Lafmas commented 1 year ago

Thanks for your quick reply!

There are 11 of samples in the analysis (that is each sample consist two fastq.gz files pair1, and pair2) and each sample has 10 gigabases input size.

fpusan commented 1 year ago

This is on the largish-size, it may or may not be possible to coassemble it with 512Gb depending on how complex the communities are in terms of alpha and beta diversity. Did the extra assembly options help in this case?

Lafmas commented 1 year ago

The contig length was set to 800, and the assembly was performed using seqmerge mode.

This approach allowed the analysis to be completed relatively quickly without any problems!

Best, Lafmas

fpusan commented 1 year ago

Glad to hear. I am actually surprised that seqmerge worked for this. Probably setting a higher contig length helped with this, by greatly reducing the amount of contigs to be merged. I will update the docs to reflect this possibility.

Lafmas commented 1 year ago

Glad to hear. I am actually surprised that seqmerge worked for this. Probably setting a higher contig length helped with this, by greatly reducing the amount of contigs to be merged. I will update the docs to reflect this possibility.

I will upload here my project syslog file for you soon.

Do you need help with anything else?