nh13 / TMAP

Torrent Mapping Alignment Program
GNU General Public License v2.0
19 stars 0 forks source link

IUPAC ambiguity codes #1

Closed lituan closed 7 years ago

lituan commented 7 years ago

there's a paragrah describing how to handle abiguous DNA bases, but I don't understand why R is converted to C, can you explain this?

Ambiguous IUPAC codes in the reference/target FASTA will be converted to the lexico- graphically smallest DNA base that is not compatible to the IUPAC code to ensure mini- mum reference bias. For example, an IUPAC base R, which represents an A or a G, will be converted to a C. All Ns in the reference will be converted to As. Furthermore, any non- IUPAC character will be treated as an N. The ambiguity codes will only be re-considered when calculating the NM and MD SAM record optional tags.

nh13 commented 7 years ago
  1. You'll need to get support from Ion Torrent folks going forward as I do not support this software.
  2. The idea is to conver the IUPAC code to a base that doesn't match one of the bases the code represents. This ensures that when we map, any of the bases represented by the IUPAC code mismatch the converted reference, causing the least amount of bias. The paragraph you quote states just the same.

Closing the issue since this software isn't maintained (see when the last commit was made).