vice87 / gam-ngs

Genomic Assemblies Merger for NGS
GNU General Public License v3.0
26 stars 10 forks source link

How does gam-ngs identify ambiguous mapping? #5

Open kstatebioinfo opened 10 years ago

kstatebioinfo commented 10 years ago

I read in your paper: "GAM-NGS uses only reads aligning to a single position (a.k.a. uniquely aligned), discarding all reads that have two or more high scoring alignments (a.k. a. ambiguously aligned)." Do you use a MAPQ (column 5) threshold? If so, what is the default?

or... Do you use field 7 "Ref. name of the mate/next read" ? If so, what do you use as alignment quality score (I understand this to be an optional field)?

I would like to use Bowtie2 in best-match mode but I wasn't sure if MAPQ was enough information for your software to detect ambiguity.

Thanks so much, Jennifer

vice87 commented 10 years ago

Dear Jennifer,

GAM-NGS currently looks at two (optional/non-standard) fields for discarding ambiguously-mapped reads:

If both fields are not present, then GAM-NGS processes the read as uniquely aligned.

Actually I haven't used Bowtie2 very much (I've used mainly BWA in all my tests) and I don't know if it uses the NH field. However, since also the latest BWA-mem algorithm does not output NH and XT fields, I'll probably check the MAPQ column in a future release.

Best, Riccardo

kstatebioinfo commented 10 years ago

Thanks,

Bowtie2 generally reports only the best mapping position. I can filter these by MapQ. That way the more ambiguous alignments won't be presented to GAM-NGS.

-J On Sep 30, 2013, at 10:42 AM, Riccardo Vicedomini wrote:

Dear Jennifer,

GAM-NGS currently looks at two (optional/non-standard) fields for discarding ambiguously-mapped reads:

NH field: it only keeps reads whose NH tag is equal to 1 BWA-sampe's XT field: it only keeps reads whose XT tag is equal to 'U' If both fields are not present, then GAM-NGS processes the read as uniquely aligned. Actually I haven't used Bowtie2 very much (I've used mainly BWA in all my tests) and I don't know if it uses the NH field. However, since also the latest BWA-mem algorithm does not output NH and XT fields, I'll probably check the MAPQ column in a future release.

Best, Riccardo

— Reply to this email directly or view it on GitHub.

kstatebioinfo commented 10 years ago

One more question,

We have long distance jumping libraries. Which fields does GAM-NGS use to find mates? Will the large (up to 20 kb) insert size be a problem? Thanks looking forward to trying your software.

Also my journal club is very interested in your software. We were wondering if you had any post-paper thoughts on your software or thoughts on our post http://bioinformaticsk-state.blogspot.com/2013/08/merging-ngs-assemblies.html .

Thanks, so much, Jennifer Shelton

On Mon, Sep 30, 2013 at 10:42 AM, Riccardo Vicedomini < notifications@github.com> wrote:

Dear Jennifer,

GAM-NGS currently looks at two (optional/non-standard) fields for discarding ambiguously-mapped reads:

  • NH field: it only keeps reads whose NH tag is equal to 1
  • BWA-sampe's XT field: it only keeps reads whose XT tag is equal to 'U' If both fields are not present, then GAM-NGS processes the read as uniquely aligned.

Actually I haven't used Bowtie2 very much (I've used mainly BWA in all my tests) and I don't know if it uses the NH field. However, since also the latest BWA-mem algorithm does not output NH and XT fields, I'll probably check the MAPQ column in a future release.

Best, Riccardo

— Reply to this email directly or view it on GitHubhttps://github.com/vice87/gam-ngs/issues/5#issuecomment-25374504 .

vice87 commented 10 years ago

I'm sorry for the late reply, in the last days I've been pretty busy and I was not able to give you an answer. The large insert size should not be a problem. Considering a long-insert-mapped read, the fields I use are those related to the "mate" sequence. In particular, looking at the SAM format specification they should be the following ones: FLAG (bits 0x8, 0x20), RNEXT and PNEXT. Moreover they are used only if the read has been unambiguously mapped (defined as in the previous comment).

I'll definitely give a look at the link you provided.

Best, Riccardo