Closed chrstraub closed 5 years ago
The --asembler
option controls which plugin script is run. The plugins are here:
% ls nullarbor/plugins/assembler/*.sh
megahit.sh shovill.sh skesa_fast.sh skesa.sh spades.sh
Let's look at them:
% cat skesa.sh
skesa --fastq "$read1,$read2" \
--cores "$cpus" \
--vector_percent 1.0 \
$opts \
--contigs_out "$outdir/contigs.fa"
% cat spades.sh
WORKDIR=$(mktemp -d)
OUTDIR="$WORKDIR/spades"
spades.py -m 32 -o "$OUTDIR" \
--pe1-1 "$read1" --pe1-2 "$read2" \
-t "$cpus" \--careful $opts
cp -v -f "$OUTDIR/scaffolds.fasta" "$outdir/contigs.fa"
cp -v -f "$OUTDIR/assembly_graph_with_scaffolds.gfa" "$outdir/contigs.gfa"
cp -v -f "$OUTDIR/spades.log" "$outdir/contigs.log"
rm -frv "$WORKDIR"
They are using default parameters mostly BUT the prokka
part sets the min contig length. But that is set by --minctllen
in the nullarbor.pl
%/contigs.gff: %/contigs.fa
gffout="$(@)" gbkout="$(@D)/contigs.gbk" contigs="$(<)" locustag="$(@D)" gcode="$(GCODE)" minlen="$(MIN_CTG_LEN)" $(ANNOTATOR)
You can control this via make MIN_CTG_LEN=100
when you run the Makefile. The default is 500.
This is also used in the assembly stats:
denovo.tab : $(CONTIGS)
fa --minsize $(MIN_CTG_LEN) -e -t $^ > $@
The reason for this is that you can't compare assemblies that have a different minimum size. People using Nullarbor often re-use older folders over time, so this puts everyone on an even footing. Given readlengths/pair-span is ~500bp then it makes sense to exclude smaller.
YES, spades will be better than skesa. But SKESA is MUCH faster and is conservative (rare to get a mis-assembly) and is often good enough for public health typing.
TLDR;
contigs.fa
is spades' scaffolds.fa
, it's renamed to be consistentMIN_CTG_LEN=500
and use make MIN_CTG_LEN=100
to change it at run time or use --minctg LEN_BP Minimum contig length for Prokka and Roary
--minctglen
is set Thanks Torsten, for clarifying this!
Hi again,
I've got a question regarding the discrepancy in number of contigs in contigs.fa and contigs.gff.
I did a comparison of SKESA vs SPAdes assembler and in terms of N50 and number of contigs SPAdes outperforms SKESA, according to the final report that is created.
SKESA
SPAdes
But when I have a look at the raw assembly files, i.e. contigs.fa - they have a much higher number of contigs. i.e. for SPAdes:
So my questions are: