Closed yfarjoun closed 4 years ago
Do people write U's in RNA bams or do they convert them to T and just know what they are in context?
No idea! I was writing tests for a different PR https://github.com/broadinstitute/picard/pull/1506 and tried adding "all" the IUPAC bases as test cases. since U is technically an IUPAC base, I tried adding it and the htsjdk validator exploded...
hmmm. I think that for now new technologies should use a tag (perhaps MM or similar from https://github.com/samtools/hts-specs/pull/418) instead of this...
While the SamSpec allows for any character in the regex
\*|[A-Za-z=.]+
, htsjdk considered only a subset of that to be valid, namely, the IUPAC characters.The current implementation ignores the 'U' option that may be produced to indicate Uracil as opposed to Thymine.
Should we add "U" to the list of valid IUPAC bases in htsjdk?