tgen / tempe

A container focused and streamlined iteration of the Phoenix pipeline
GNU General Public License v3.0
1 stars 1 forks source link

PySam read processing order and supplementary alignments issue #71

Open ChristopheLegendre opened 5 months ago

ChristopheLegendre commented 5 months ago

https://github.com/tgen/tempe/blob/2e3d76d92617ddfbb7cc7f56cdf06fdf00539899/required_scripts/manta_prepare_sv_vcf.py#L724
minor bug encountered that leads to false negative calls due to under-estimated RCDIS count.

The line 724 listed in the link above shows that if a read is supplementary (bit 0x800), the read is skipped.
This is fine.
But the issue arises when the first read encountered with that name is a supplementary alignment in the current design.
The read name is added to a list of processed reads to avoid redundant processing of the same read twice within the same region;
Therefore, when the first read encountered (pysam fetches the reads from left to right within a specific region), the primary alignment, when exists, is automatically skipped since the read name exists in the list of "already" processed reads.

To fix that issue, we need to :

Both solutions have been tested and the second solution is the working solution; Based on the current design, removing the not x.is_supplementary in line 724 is not enough to capture the correct reads for the RCDIS count. Not adding the first time encountered supp aln to the list of processed reads must also be fix. Again that implies the primary alignment is in within the same processed current region of interest.

Further tests will be needed to validate the bug fix.
Appropriate samples with Translocations with lots of supplementary alignments need to be found and tested.