twolinin / longphase

GNU General Public License v3.0
99 stars 9 forks source link

feature request: estimated phasing quality #19

Closed jts closed 4 months ago

jts commented 2 years ago

Hi,

Thanks for longphase, I have been using it the last few weeks and am impressed with its speed and usability. My main use case is generating a haplotagged .bam file, for use in downstream analysis. Is it possible to add a tag to the bam record with an estimated "phasing quality" score (a phred-scaled estimate that the assigned phase is incorrect)? If not, it would be great to have simple matching statistics (number of heterozygous variants that the read covered, number consistent with h1/h2, etc) available to the user.

Thanks for considering, Jared

ythuang0522 commented 2 years ago

This's an interesting suggestion as we know quite a few regions are challenging for phasing. It should be good to provide this info. Simple statistics would be easier as we are not sure how to estimate the Phred-scaled quality properly.

ythuang0522 commented 2 years ago

Hi @jts, We have added the Phred-scaled phasing/tagging quality of each read at the bam as requested at release v1.3. It's a phred scale of inconsistent probability, i.e., -10*log_10(Number of inconsistent loci/Number of consistent+inconsistent loci). It's written into the PQ flag (e.g., PQ:i:40).

If there are no inconsistent loci during haplotype assignment, we directly set PQ = 40 for this read. Note that the untagged reads were set to PQ=0. Below please find the distribution of PQ values at 10x HG002 (left: Number of reads, right: PQ value).

image
ythuang0522 commented 2 years ago

Forgot to mention that the haplotag provided a --log option which outputs a tabular file storing a few statistics for each read that you might be interested. We also added the Phasing Quality into the table at v1.3. image

jts commented 2 years ago

Thanks so much!

On Aug 25, 2022, at 12:22 PM, Yao-Ting Huang @.***> wrote:

 Forgot to mention that the haplotag provided a --log option which outputs a tabular file storing a few statistics for each read that you might be interested. We also added the Phasing Quality into the table at v1.3.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.