jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
380 stars 80 forks source link

minimum contig length and assembly options with SPAdes #104

Closed nmquijada closed 4 years ago

nmquijada commented 4 years ago

Dear all,

First of all congratulations for the development of SqueezeMeta. It is deffinitively a great tool!

I run the pipeline as SqueezeMeta.pl -m coassembly -p SQM -s samples.txt -f reads -a spades -t 60 -b 100 and it run nicely to the end. However, I found a couple of small bugs.

  1. I did not set any minimum contig length, as I was fine by using the default 200 bp. However, when I went to the metagenome fasta file, contigs with less than 200bp were kept (the minimum lenth was 100bp) and undergo downstream analysis. Could this issue be related with the line 112 of 01.run_assembly.pl, if($mincontiglen>200) { , as it is considering the variable when it is higher than 200 but not equal to 200? In this thread, https://github.com/jtamames/SqueezeMeta/issues/78, you said that 200 bp is the default with Megahit, so probably it is an issue that you only find when using Spades (I overcome this by adding -c 201 to the script).

  2. I wanted to increase the maximum RAM during assembly, and so I run the analysis like this: SqueezeMeta.pl -m coassembly -p SQM -s samples.txt -f reads -a spades -t 60 -b 100 -assembly_options "-m 800"

However, as in the spades options from 01.run_assembly.pl it already appears "-m 400", the SQM/data/spades/spades.log looks like:

Command line: /home/quijada/software/miniconda3/envs/SqueezeMeta/SqueezeMeta/bin/SPAdes/spades.py -m 800 --meta --pe1-1 /home/quijada/WD/SQM/data/raw_fastq/par1.fastq.gz --pe1-2 /home/quijada/WD/SQM/data/raw_fastq/par2.fastq.gz -m 400 -k 21,33,55,77,99,127 -t 60 -o /home/quijada/WD/SQM/data/spades

And it seems that this second "-m" option (-m 400) prevail, as when looking to the spades section of the syslog it appears: Memory limit (in Gb): 400

And just a last question. I saw that some SQM scripts, etc. are updated regularly. Are these changes taking place also in the conda package? Or is it recommended to clone the github content regularly and replace those files from the conda environment?

Thank you very much, Narciso

fpusan commented 4 years ago

We update the conda repository every time we release a new version of SqueezeMeta. Before each release, we test release candidate to make sure there are no bugs (success rates may vary..).

Between releases, there are of course many changes to the code, but they are not subjected to that kind of testing, and I personally would not recommend to download that content blindly and expect it to work, unless you need some particular feature that was included in a recent commit (e.g. the fix for the bug in --assembly_options that you just pointed out).

jtamames commented 4 years ago

Hello Narciso

Thank you very much for your message. We were not aware of these problems so it is very helpful. We have just fixed both issues: calls to assemblers now considers user´s options in the rigth way, and I removed the check for contig length and therefore now prinseq is run every time. As you say, not all assemblers use the same default contig length.

Best, Javier

nmquijada commented 4 years ago

Dear all, Thank you very much for the quick answer, the information and the solutions! That was excellent. All the best, Narciso