katholt / srst2

Short Read Sequence Typing for Bacterial Pathogens
Other
123 stars 65 forks source link

Difference in log files for negatives when using gene_db #47

Open kajcox opened 8 years ago

kajcox commented 8 years ago

Hi Kat, this is in reply to your email, but I have included the original query below also.

The files that are created are: 33-paired.gene_db.sam 3__3-paired.gene_db.sam.mod 33-paired.gene_db.unsorted.bam 3genesgene_db__results.txt However the top 3 are empty and the genes results table just has the isolate number in it?

I've run 10 different genes using 10 different gene_db's now and have returned this error in the log file on a number of negatives - but a lot of them are also ones I'd expect to be negative. Sometimes when an isolate has a positive result at 90% and negative at 60% or vice versa and this log file error has been present it's indicated a file that needs to be re-run (I'm doing 1500 and the internet hamster in Sydney sometimes drops my VPN connection!) But a lot of them when I've rerun them have returned the same log file error and seem to be fine in all other aspects?

Many thanks in advance

Karen

Original query - I am currently using SRST2 to map reads to target sequences using gene_db. I keep getting a difference in log files for my negatives? Some seem to run fine and return the usual but some of my log files are returning a message after the samtools portion which I have highlighted below.

I have run a different gene_db on this same isolate and not had this message turn up in the log file and similarly I have used the same gene_db.fasta file on different isolates and they seem to have worked fine?

Is this actually a run error and false negative or should I just assume this is a negative for the target sequence? Am I interpreting the log files incorrectly? The below message (2) occurs on the screen when the log files change also? I have tried forcing 0.1.18 version of samtools but this seems to have no effect?

Log file error 08/27/2015 04:47:29 program started 08/27/2015 04:47:29 command line: /local/software/python/2.7.5/bin/srst2 --input_pe /scratch/kc1e12/PhD/WAIT_trimmed/3-paired_1.fastq.gz /scratch/kc1e12/PhD/WAIT_trimmed/3-paired_2.fastq.gz --log$ 08/27/2015 04:47:29 Total paired readsets found:1 08/27/2015 04:47:29 Index for gene_db.fasta is already built... 08/27/2015 04:47:29 Processing database gene_db.fasta 08/27/2015 04:47:29 Processing sample 3-paired 08/27/2015 04:47:29 Output prefix set to: 33-paired.gene_db 08/27/2015 04:47:29 Aligning reads to index gene_db.fasta using bowtie2... 08/27/2015 04:47:29 Running: bowtie2 -1 /scratch/kc1e12/PhD/WAIT_trimmed/3-paired_1.fastq.gz -2 /scratch/kc1e12/PhD/WAIT_trimmed/3-paired_2.fastq.gz -S 33-paired.gene_db.sam -q --very-sensi$ 08/27/2015 04:48:28 Processing Bowtie2 output with SAMtools... 08/27/2015 04:48:28 Generate and sort BAM file... 08/27/2015 04:48:28 Running: samtools view -b -o 33-paired.gene_db.unsorted.bam -q 1 -S 3__3-paired.gene_db.sam.mod 08/27/2015 04:48:28 {'message': "Command 'samtools view -b -o 33-paired.gene_db.unsorted.bam -q 1 -S 33-paired.gene_db.sam.mod' failed with non-zero exit status: 1"} 08/27/2015 04:48:28 failed gene detection 08/27/2015 04:48:28 Tabulating results for database gene_db.fasta ... 08/27/2015 04:48:28 Finished processing for database gene_db.fasta ... 08/27/2015 04:48:28 Gene detection output printed to 3genesgene_dbresults.txt 08/27/2015 04:48:28 SRST2 has finished.

2) Screen error ' is recognized as '*'. [main_samview] truncated file.

katholt commented 8 years ago

OK so if the sams/bams are empty but bowtie2 didn’t error out, it means no reads mapped to your database and it's a true negative. We will add a check for 0 mapped reads to report this more clearly in future.

kajcox commented 8 years ago

Excellent news! Thanks Kat. Best wishes Karen On 11/09/2015 6:29 AM, "Kat Holt" notifications@github.com wrote:

OK so if the sams/bams are empty but bowtie2 didn’t error out, it means no reads mapped to your database and it's a true negative. We will add a check for 0 mapped reads to report this more clearly in future.

— Reply to this email directly or view it on GitHub https://github.com/katholt/srst2/issues/47#issuecomment-139370162.