ucscGenomeBrowser / kent

UCSC Genome Browser source tree. Stable branch: "beta".
http://genome.ucsc.edu/
Other
217 stars 84 forks source link

bamToPsl semantic bug #19

Closed bredeson closed 5 years ago

bredeson commented 5 years ago

Hi,

I'm reporting a semantic error for bamToPsl: When parsing BAM files where supplementary alignments are included but have been hard clipped (with the BAM CIGAR H operator), the length of the query sequence is incorrectly inferred. When inferring the actual length of the query sequence for reporting Q size in the PSL file, soft clipping (BAM CIGAR S) and hard clipping operators need to be considered. This should added as an option flag for the user, at the very least.

For example, my output bamToPsl PSL output from parsing a test BAM file generated by minimap2 yields:

psLayout version 3

match   mis-    rep.    N's Q gap   Q gap   T gap   T gap   strand  Q           Q       Q       Q   T           T       T       T   block   blockSizes  qStarts  tStarts
        match   match       count   bases   count   bases           name        size    start   end name        size    start   end count
---------------------------------------------------------------------------------------------------------------------------------------------------------------
1000    0   0   0   0   0   0   0   +   query   1000    0   1000    target  10000   0   1000    1   1000,   0,  0,
8000    0   0   0   0   0   0   0   +   query   10000   2000    10000   target  10000   2000    10000   1   8000,   2000,   2000,

Best, Jessen

braneyboo commented 5 years ago

Hey Jessen,

Thanks for reporting this. I'll add it to our work queue. I don't know when it will actually get done.

brian raney