I'm reporting a semantic error for bamToPsl: When parsing BAM files where supplementary alignments are included but have been hard clipped (with the BAM CIGAR H operator), the length of the query sequence is incorrectly inferred. When inferring the actual length of the query sequence for reporting Q size in the PSL file, soft clipping (BAM CIGAR S) and hard clipping operators need to be considered. This should added as an option flag for the user, at the very least.
For example, my output bamToPsl PSL output from parsing a test BAM file generated by minimap2 yields:
psLayout version 3
match mis- rep. N's Q gap Q gap T gap T gap strand Q Q Q Q T T T T block blockSizes qStarts tStarts
match match count bases count bases name size start end name size start end count
---------------------------------------------------------------------------------------------------------------------------------------------------------------
1000 0 0 0 0 0 0 0 + query 1000 0 1000 target 10000 0 1000 1 1000, 0, 0,
8000 0 0 0 0 0 0 0 + query 10000 2000 10000 target 10000 2000 10000 1 8000, 2000, 2000,
Hi,
I'm reporting a semantic error for
bamToPsl
: When parsing BAM files where supplementary alignments are included but have been hard clipped (with the BAM CIGARH
operator), the length of the query sequence is incorrectly inferred. When inferring the actual length of the query sequence for reportingQ size
in the PSL file, soft clipping (BAM CIGARS
) and hard clipping operators need to be considered. This should added as an option flag for the user, at the very least.For example, my output
bamToPsl
PSL output from parsing a test BAM file generated byminimap2
yields:Best, Jessen