seqan / slimm

Species Level Identification of Microbes from Metagenomes
Other
27 stars 3 forks source link

Question: How does slimm deal with discordant mapping of paired end reads? #38

Open your-highness opened 4 years ago

your-highness commented 4 years ago

Dear @temehi & @agakrawczyk ,

@agakrawczyk and me ran in problems when using bowtie2 and slimm:

  1. Yara, bowtie2 and slimm indizes with Human chr1 & chr11 and C-RVDB (https://rvdb.dbi.udel.edu/) were built.
  2. An in silico data set comprising 91% Human chr1 & chr11 reads and 9% Human virus reads of various species was generated and mapped with bowtie2 or yara.
  3. Slimm was used for abundance estimation for bowtie2 or yara mappings.

While "yara+slimm" gave consistent results for all assayed viruses, "bowtie2+slimm" did fail for two viruses. Closer inspection on mapping files showed that bowtie2 reported many discordant mapping across various reference sequences:

MK630134.1_50_0 97      KY315545.1      7713    1       301M    KY315552.1      8791    0       CCAGGTCCAGTCAGATAATAAATATCCGATAAGGAACAAAAGGAAAAGTCAAAGTCCTGGAAAGCATCCACCCTGATTCTCTTGCCGGAATGTTTTGCCACGTAATCCTTCATAGCAGGGAGATTCCTCTGTAAAAGGATAAATTCCTCCCGGCCGCATTTAGTCTCTCCGAACCACATCTTTTCCTGCTCTGGATCTAACTCCGCTTCGGCAAAAGGCGGAAATTGTCTAAGTCCAATATTGAAGAACTGGACCAATGATTCTGCTATGATGTACACTTTTTCCGTCACCGTGTCCACGG   CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGFGGGGGGGGGGGGGGDGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGDGGFGGGGGGGGG9FGGGFGGGGGFGGGGG<GCGGFGGFAGCGGG@GGGGEFGEFDGGGGD9FGGGGGGAGGEFFFCGFGFDAEEFF@FF@8GGECFCDFG7;@@EGFGG*AFCFG,FGE:EF@F,:@G*G?DFDF7*FGCGGCC9EFE+FDG)FC78FFG6<2GG73;+<AA2F881)1)5.)8.0:).+AF09:F9/*   AS:i:-2 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:281C19     YT:Z:UP
MK630134.1_50_0 353     KY290183.1      7394    1       301M    KY315552.1      8791    0       CCAGGTCCAGTCAGATAATAAATATCCGATAAGGAACAAAAGGAAAAGTCAAAGTCCTGGAAAGCATCCACCCTGATTCTCTTGCCGGAATGTTTTGCCACGTAATCCTTCATAGCAGGGAGATTCCTCTGTAAAAGGATAAATTCCTCCCGGCCGCATTTAGTCTCTCCGAACCACATCTTTTCCTGCTCTGGATCTAACTCCGCTTCGGCAAAAGGCGGAAATTGTCTAAGTCCAATATTGAAGAACTGGACCAATGATTCTGCTATGATGTACACTTTTTCCGTCACCGTGTCCACGG   CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGFGGGGGGGGGGGGGGDGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGDGGFGGGGGGGGG9FGGGFGGGGGFGGGGG<GCGGFGGFAGCGGG@GGGGEFGEFDGGGGD9FGGGGGGAGGEFFFCGFGFDAEEFF@FF@8GGECFCDFG7;@@EGFGG*AFCFG,FGE:EF@F,:@G*G?DFDF7*FGCGGCC9EFE+FDG)FC78FFG6<2GG73;+<AA2F881)1)5.)8.0:).+AF09:F9/*   AS:i:-2 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:281C19     YT:Z:UP
MK630134.1_50_0 353     KY316048.1      11168   1       301M    KY315552.1      8791    0       CCAGGTCCAGTCAGATAATAAATATCCGATAAGGAACAAAAGGAAAAGTCAAAGTCCTGGAAAGCATCCACCCTGATTCTCTTGCCGGAATGTTTTGCCACGTAATCCTTCATAGCAGGGAGATTCCTCTGTAAAAGGATAAATTCCTCCCGGCCGCATTTAGTCTCTCCGAACCACATCTTTTCCTGCTCTGGATCTAACTCCGCTTCGGCAAAAGGCGGAAATTGTCTAAGTCCAATATTGAAGAACTGGACCAATGATTCTGCTATGATGTACACTTTTTCCGTCACCGTGTCCACGG   CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGFGGGGGGGGGGGGGGDGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGDGGFGGGGGGGGG9FGGGFGGGGGFGGGGG<GCGGFGGFAGCGGG@GGGGEFGEFDGGGGD9FGGGGGGAGGEFFFCGFGFDAEEFF@FF@8GGECFCDFG7;@@EGFGG*AFCFG,FGE:EF@F,:@G*G?DFDF7*FGCGGCC9EFE+FDG)FC78FFG6<2GG73;+<AA2F881)1)5.)8.0:).+AF09:F9/*   AS:i:-2 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:281C19     YT:Z:UP
MK630134.1_50_0 353     KY274508.1      7501    1       301M    KY315552.1      8791    0       CCAGGTCCAGTCAGATAATAAATATCCGATAAGGAACAAAAGGAAAAGTCAAAGTCCTGGAAAGCATCCACCCTGATTCTCTTGCCGGAATGTTTTGCCACGTAATCCTTCATAGCAGGGAGATTCCTCTGTAAAAGGATAAATTCCTCCCGGCCGCATTTAGTCTCTCCGAACCACATCTTTTCCTGCTCTGGATCTAACTCCGCTTCGGCAAAAGGCGGAAATTGTCTAAGTCCAATATTGAAGAACTGGACCAATGATTCTGCTATGATGTACACTTTTTCCGTCACCGTGTCCACGG   CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGFGGGGGGGGGGGGGGDGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGDGGFGGGGGGGGG9FGGGFGGGGGFGGGGG<GCGGFGGFAGCGGG@GGGGEFGEFDGGGGD9FGGGGGGAGGEFFFCGFGFDAEEFF@FF@8GGECFCDFG7;@@EGFGG*AFCFG,FGE:EF@F,:@G*G?DFDF7*FGCGGCC9EFE+FDG)FC78FFG6<2GG73;+<AA2F881)1)5.)8.0:).+AF09:F9/*   AS:i:-2 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:281C19     YT:Z:UP
MK630134.1_50_0 353     MH698400.1      15286   1       301M    KY315552.1      8791    0       CCAGGTCCAGTCAGATAATAAATATCCGATAAGGAACAAAAGGAAAAGTCAAAGTCCTGGAAAGCATCCACCCTGATTCTCTTGCCGGAATGTTTTGCCACGTAATCCTTCATAGCAGGGAGATTCCTCTGTAAAAGGATAAATTCCTCCCGGCCGCATTTAGTCTCTCCGAACCACATCTTTTCCTGCTCTGGATCTAACTCCGCTTCGGCAAAAGGCGGAAATTGTCTAAGTCCAATATTGAAGAACTGGACCAATGATTCTGCTATGATGTACACTTTTTCCGTCACCGTGTCCACGG   CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGFGGGGGGGGGGGGGGDGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGDGGFGGGGGGGGG9FGGGFGGGGGFGGGGG<GCGGFGGFAGCGGG@GGGGEFGEFDGGGGD9FGGGGGGAGGEFFFCGFGFDAEEFF@FF@8GGECFCDFG7;@@EGFGG*AFCFG,FGE:EF@F,:@G*G?DFDF7*FGCGGCC9EFE+FDG)FC78FFG6<2GG73;+<AA2F881)1)5.)8.0:).+AF09:F9/*   AS:i:-2 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:281C19     YT:Z:UP
MK630134.1_50_0 353     KY315552.1      7711    1       301M    =       8791    0       CCAGGTCCAGTCAGATAATAAATATCCGATAAGGAACAAAAGGAAAAGTCAAAGTCCTGGAAAGCATCCACCCTGATTCTCTTGCCGGAATGTTTTGCCACGTAATCCTTCATAGCAGGGAGATTCCTCTGTAAAAGGATAAATTCCTCCCGGCCGCATTTAGTCTCTCCGAACCACATCTTTTCCTGCTCTGGATCTAACTCCGCTTCGGCAAAAGGCGGAAATTGTCTAAGTCCAATATTGAAGAACTGGACCAATGATTCTGCTATGATGTACACTTTTTCCGTCACCGTGTCCACGG   CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGFGGGGGGGGGGGGGGDGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGDGGFGGGGGGGGG9FGGGFGGGGGFGGGGG<GCGGFGGFAGCGGG@GGGGEFGEFDGGGGD9FGGGGGGAGGEFFFCGFGFDAEEFF@FF@8GGECFCDFG7;@@EGFGG*AFCFG,FGE:EF@F,:@G*G?DFDF7*FGCGGCC9EFE+FDG)FC78FFG6<2GG73;+<AA2F881)1)5.)8.0:).+AF09:F9/*   AS:i:-2 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:281C19     YT:Z:UP

Yara did not report these discordant mappings.

My question: How does slimm deal with discordant mappings? My suspection is that these are discarded.

Thanks in advance!