Open lskatz opened 3 years ago
To cap this off, my results were indeed not promising for my test set. I would expect about 1748 locus results because these are the core loci. However, the results only approximated 1k per genome. I have indeed shown myself to go with Chewie instead of mlst for cgMLST analysis.
\ls *_spades.fasta | xargs -P 12 -n 1 bash -c 'mlst --scheme lmonocgmlst --threads 2 --novel $0.novel.fasta.tmp $0 > $0.tsv.tmp; touch $0.novel.fasta.tmp;'
for tsv in *.tsv.tmp; do name=$(basename $tsv .fasta.tsv.tmp); touch $name.fasta.novel.fasta.tmp; novel=$(( $(wc -l < $name.fasta.novel.fasta.tmp) / 2)); echo -ne "$name\t$novel\t"; head -n1 $tsv | perl -lane 'for(@F){ if(/(\(\d+\))/){ $num++} } print "\t$num";'; done
...to produce a table of novel and exact allele matches per assembly in my benchmarking dataset.
assembly numNovel numExact
Listeria_shovill_SRR10323923_spades 4 987
Listeria_shovill_SRR10483479_spades 994 139
Listeria_shovill_SRR10505985_spades 1 396
Listeria_shovill_SRR10696096_spades 101 265
Listeria_shovill_SRR13296922_spades 328 75
Listeria_shovill_SRR14044344_spades 5 355
Listeria_shovill_SRR14404488_spades 4 439
Listeria_shovill_SRR14669035_spades 4 397
Listeria_shovill_SRR15356214_spades 0 271
Listeria_shovill_SRR9973979_spades 4 902
Not shown: Chewie results gave me on average 1740 loci per assembly
I am trying to make a guide for myself to create a cgMLST or even a wgMLST scheme. I hope this helps others. I suggest adding this in some way to the documentation, although I might still go with @tseemann's suggestion and just go with Chewie. Anyway, if you're like me and just have to try it out first to convince yourself...
Step 1: download the whole scheme from https://chewbbaca.online into a folder Listeria_monocytogenes.chewbbaca
Step 2: add git to the scheme so that you can check it back out in case you make a mistake.
Step 2b, undocumented: make sure there are no deflines with
*
indicating custom alleles not in the universal set of alleles.Step 3a, not documented here: install mlst
Step 3b: mlst db creation