odelaneau / shapeit5

Segmented HAPlotype Estimation and Imputation Tool
https://odelaneau.github.io/shapeit5/
MIT License
56 stars 9 forks source link

Working with SVs: preserve INFO/END field after phase_common #72

Open dtaliun opened 7 months ago

dtaliun commented 7 months ago

Hi,

Thank you for another amazing tool!

I don't have any critical issues to report, but I have a small feature request to keep the INFO/END field in the processed BCF files.

I was experimenting with adding SVs for phasing and ended up with the following observation. The phase_common tool processes common SVs but drops the INFO/END field in the output BCF. Then, phase_rare uses the synced BCF reader from HTSlib to read two BCFs simultaneously: phased common SVs (i.e., output from phase_common) and all unphased SVs. However, without the INFO/END filed, the synced BCF reader treats SVs as different records even when their ID and POS fields match. As a result, phase_rare can't recognize that the SVs were already phased, which results in duplicated entries in the output and probably wrong estimates (see screenshots of unphased input and phased output after phase_rare).

The workaround for people who encountered the same issue is easy: add back the INFO/END to the BCF with common phased SVs before running phase_rare, e.g. using bcftools annotate -a [unphased BCF] -c INFO/END [phased common BCF] ... .

input output