katholt / srst2

Short Read Sequence Typing for Bacterial Pathogens
Other
125 stars 65 forks source link

mlst results file only contains header when running certain samples #31

Closed ppcherng closed 9 years ago

ppcherng commented 9 years ago

Here are the commands I ran:

getmlst.py --species "Escherichia coli#1" wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR028/ERR028698/ERR028698_1.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR028/ERR028698/ERR028698_2.fastq.gz srst2 --input_pe ERR028698*.fastq.gz --output shigella1 --log --save_scores --mlst_db Escherichia_coli#1.fasta --mlst_definitions ecoli.txt

This is the output I get: Sample ST adk fumC gyrB icd mdh purA recA mismatches uncertainty depth maxMAF

Note that there is only a header and not even the sample name is printed

Here is the log, which does not indicate any errors:

01/24/2015 19:22:24 program started 01/24/2015 19:22:24 command line: /usr/local/bin/srst2 --input_pe ERR028698_1.fastq.gz ERR028698_2.fastq.gz --output shigella1 --log --save_scores --mlst_db Escherichia_coli#1.fasta --mlst_definitions ecoli.txtsrst2 --input_pe ERR028698_1.fastq.gz ERR028698_2.fastq.gz --output shigella1 --log --save_scores --mlst_db Escherichia_coli#1.fasta --mlst_definitions ecoli.txt 01/24/2015 19:22:24 Total paired readsets found:1 01/24/2015 19:22:24 Building bowtie2 index for Escherichia_coli#1.fasta... 01/24/2015 19:22:24 Running: bowtie2-build Escherichia_coli#1.fasta Escherichia_coli#1.fasta 01/24/2015 19:22:29 Processing database Escherichia_coli#1.fasta 01/24/2015 19:22:29 Running: samtools faidx Escherichia_coli#1.fasta 01/24/2015 19:22:30 Processing sample ERR028698 01/24/2015 19:22:30 Output prefix set to: shigella1ERR028698.Escherichia_coli#1 01/24/2015 19:22:30 Aligning reads to index Escherichia_coli#1.fasta using bowtie2... 01/24/2015 19:22:30 Running: bowtie2 -1 ERR028698_1.fastq.gz -2 ERR028698_2.fastq.gz -S shigella1ERR028698.Escherichia_coli#1.sam -q --very-sensitive-local --no-unal -a -x Escherichia_coli#1.fasta 01/24/2015 19:22:32 Processing Bowtie2 output with SAMtools... 01/24/2015 19:22:32 Generate and sort BAM file... 01/24/2015 19:22:32 Running: samtools view -b -o shigella1ERR028698.Escherichia_coli#1.unsorted.bam -q 1 -S shigella1__ERR028698.Escherichia_coli#1.sam.mod 01/24/2015 19:22:32 Running: samtools sort shigella1ERR028698.Escherichia_coli#1.unsorted.bam shigella1ERR028698.Escherichia_coli#1.sorted 01/24/2015 19:22:32 Deleting sam and bam files that are not longer needed... 01/24/2015 19:22:32 Deleting shigella1__ERR028698.Escherichia_coli#1.sam 01/24/2015 19:22:32 Deleting shigella1ERR028698.Escherichia_coli#1.sam.mod 01/24/2015 19:22:32 Deleting shigella1ERR028698.Escherichia_coli#1.unsorted.bam 01/24/2015 19:22:32 Generate pileup... 01/24/2015 19:22:32 Running: samtools mpileup -L 1000 -f Escherichia_coli#1.fasta -Q 20 -q 1 shigella1ERR028698.Escherichia_coli#1.sorted.bam 01/24/2015 19:22:32 Processing SAMtools pileup... 01/24/2015 19:22:33 Scoring alleles... 01/24/2015 19:22:39 Finished processing for read set ERR028698 ... 01/24/2015 19:22:39 Finished processing for database Escherichia_coli#1.fasta ... 01/24/2015 19:22:39 MLST output printed to shigella1mlstEscherichia_coli#1__results.txt 01/24/2015 19:22:39 SRST2 has finished.

katholt commented 9 years ago

The reason no results are reported is that there are hardly reads in this file, and so the mapping results in <1x average read depth, 0x depth on the edges of the allele sequences and <50% coverage of any alleles.

SRST2 will only report results where %coverage and %similarity passes the default cut-offs (90%). Currently this rule applies to both MLST and gene detection.

This could be altered for MLST, so that the absence of mapping to MLST loci is obvious in the results... we can add this to the next release.