snayfach / MIDAS

An integrated pipeline for estimating strain-level genomic variation from metagenomic data
http://dx.doi.org/10.1101/gr.201863.115
GNU General Public License v3.0
119 stars 52 forks source link

Trouble with uniquely mapped reads #43

Closed nafeesatk closed 7 years ago

nafeesatk commented 7 years ago

I am trying to just learn how to use the tool and am having issues with the first step (species taxonomy) when running samples downloaded from NCBI. I ran the test script and went through the tutorial without issue. I downloaded one of the sequences from the Hadza microbiome study and tried to run MIDAS with that. At the first step I did not get any uniquely mapped reads and therefore can't seem to get the species taxonomy and abundance for the sample. I've run in to this with other sequences I've tried as well. Here is the result I got:

sbrgxps@sbrgxps-XPS-8910:~/MIDAS$ run_midas.py species SRR1927149 -1 ../rampelliseqs/SRR1927149.fq

MIDAS: Metagenomic Intra-species Diversity Analysis System version 1.2.1; github.com/snayfach/MIDAS Copyright (C) 2015-2016 Stephen Nayfach Freely distributed under the GNU General Public License (GPLv3)

===========Parameters=========== Command: /home/sbrgxps/MIDAS/scripts/run_midas.py species SRR1927149 -1 ../rampelliseqs/SRR1927149.fq Script: run_midas.py species Database: /home/sbrgxps/MIDAS/midas_db_v1.2 Output directory: SRR1927149 Input reads (1st mate): ../rampelliseqs/SRR1927149.fq Input reads (2nd mate): None Remove temporary files: False Word size for database search: 28 Minimum mapping alignment coverage: 0.75 Number of reads to use from input: use all Number of threads for database search: 1

Aligning reads to marker-genes database 17.92 minutes 0.73 Gb maximum memory

Classifying reads total alignments: 25486 uniquely mapped reads: 0 ambiguously mapped reads: 0 0.0 minutes 0.74 Gb maximum memory

Estimating species abundance total marker-gene coverage: 0.0 0.0 minutes 0.74 Gb maximum memory

Did I install something incorrectly or is there an issue with how I am running the scripts? I am new to using metagenomics analysis tools so I wasn't sure where to start with troubleshooting.

Thank you!

snayfach commented 7 years ago

Hi nafeesatk,

Everything you've done looks correct. However, I've run that same sample through MIDAS and get several hundred uniquely reads mapped to the marker genes.

I think the easiest solution would be for me to recreate the issue on my system using your FASTQ file. If you send me your email address, I can send a dropbox link where you can upload your FASTQ.

Thanks, Stephen

nafeesatk commented 7 years ago

Hi Stephen,

Thank you so much for your help! My email is ntkhan@eng.ucsd.edu.

Best,

Nafeesa

snayfach commented 7 years ago

Hi Nafeesa,

I see the issue now. It appears you have merged read1 and read2 from SRR1927149_1.fastq and SRR1927149_2.fastq.

For most Illumina libraries read1 and read2 occur about 350 bp apart. So you won't get an alignment that covers the entire read, because in reality, there is a big gap between them. MIDAS sees that the alignments only cover a portion of the read and is therefore throwing them out.

I would strongly recommend running MIDAS without merging your reads. But if this is not possible, you can try decreasing the value for --aln_cov. The default if 0.7, but you can try dropping it to 0.35.

Best, Stephen