Open timodonnell opened 9 years ago
I could see the advantage of having the reads at the pileup as well since there is a difference between no reads mapped to that region or all the reads skipped that region, i.e. skipping an exon or different transcripts etc.
Doesn't pileupFlatMap
handle this already with LociMap
? If you want to look at exonic loci anyways we have have the --loci-from-file
take a GTF file of exonic locations and run pileupFlatMap
over those regions only?
I think it should be possible to get at the clipped reads in a pileup if you want to, but after dealing with RNA a bit in Guacamole I think by far the most common case is you're not interested in clipped bases, as they're not really "aligned" to that locus. Soft clipping is really just an artifact of how gapped alignments are (awkwardly) represented.
I think the right compromise may be to have the convenience functions on the Pileup
class (depth
, positiveDepth
, alleleElements
) exclude clipped reads, but not go the more extreme route and remove them from the Pileup
altogether
We currently include clipped (either S or N cigar operator) reads in the pleup at a locus. For RNA-seq this is a performance issue (and arguably a programming gotcha) since often the vast majority of pileup elements at a locus will be clipped (with the N=intron CIGAR operator, i.e. the locus is not included in the gapped alignment for that read). I think we should change the semantics of the
Pileup
class to always exclude pileup elements that are clipped, since really they're not properly part of the pileup. Possibly the right way to do this is to modifyMappedRead
so it overrides theHasReferenceRegion
trait'soverlapsLocus
method to return false if the given locus is in a clipped part of that read.Additionally, if it's doable, we should modify
pileupFlatmap
and friends to not send reads to tasks if all the loci assigned to that task are clipped in the read.