Closed VitorAguiar closed 8 months ago
We haven't investigated this enough, I think we wanted to focus on variants that may not be impacted by splicing, but I think it should be possible to keep them if using an aligner that does a good job in mapping reads in this scenario.
During the pre-processing of the pileup file, variants like the one below are removed:
chr11 35208126 T 49 ccc,CCCCc,C,><CC<<><<>><ccc.Ccc.CcCCcc,cC.cCcccc.
That happens because of the awk filter below, which removes any variant that is skipped in any alignment (e.g., exonic variants that are spliced out in some RNA molecules), as indicated by the characters ">" and "<" in the 5th column of the pileup file.
awk -v OFS='\t' '{ if ($4>0 && $5 !~ /[^\^][<>]/...
I believe the variant should not be removed since, although it is skipped by 10 reads, it is documented by 39 reads (8 matching the REF allele, and 31 matching the ALT allele).
For example, GATK's ASEReadCounter keeps the variant.
Please, can you clarify what is the justification to remove variants such as the one in my example?