The line 724 listed in the link above shows that if a read is supplementary (bit 0x800), the read is skipped.
This is fine.
But the issue arises when the first read encountered with that name is a supplementary alignment in the current design.
The read name is added to a list of processed reads to avoid redundant processing of the same read twice within the same region;
Therefore, when the first read encountered (pysam fetches the reads from left to right within a specific region), the primary alignment, when exists, is automatically skipped since the read name exists in the list of "already" processed reads.
To fix that issue, we need to :
either remove line 724 and process the supplementary alignment as as much informative as primary alignments, but that cause a problem if the mate is on the same chromosome for that alignment
or skip the supplementary alignment record when they are encountered before the primary alignment (this implies that a primary alignment is located in the same area as the supplementary alignment; if not, a second pass to process the data might be needed, because if we only have supp aln, we could skip all of them and ending up with the same under-estimated count for RCDIS)
Both solutions have been tested and the second solution is the working solution; Based on the current design, removing the not x.is_supplementary in line 724 is not enough to capture the correct reads for the RCDIS count. Not adding the first time encountered supp aln to the list of processed reads must also be fix. Again that implies the primary alignment is in within the same processed current region of interest.
Further tests will be needed to validate the bug fix.
Appropriate samples with Translocations with lots of supplementary alignments need to be found and tested.
https://github.com/tgen/tempe/blob/2e3d76d92617ddfbb7cc7f56cdf06fdf00539899/required_scripts/manta_prepare_sv_vcf.py#L724
minor bug encountered that leads to false negative calls due to under-estimated
RCDIS
count.The line 724 listed in the link above shows that if a read is supplementary (bit 0x800), the read is skipped.
This is fine.
But the issue arises when the first read encountered with that name is a supplementary alignment in the current design.
The read name is added to a list of processed reads to avoid redundant processing of the same read twice within the same region;
Therefore, when the first read encountered (
pysam
fetches the reads from left to right within a specific region), the primary alignment, when exists, is automatically skipped since the read name exists in the list of "already" processed reads.To fix that issue, we need to :
RCDIS
)Both solutions have been tested and the second solution is the working solution; Based on the current design, removing the
not x.is_supplementary
in line 724 is not enough to capture the correct reads for theRCDIS
count. Not adding the first time encounteredsupp aln
to the list of processed reads must also be fix. Again that implies the primary alignment is in within the same processed current region of interest.Further tests will be needed to validate the bug fix.
Appropriate samples with Translocations with lots of supplementary alignments need to be found and tested.