gqi / DAESC

10 stars 1 forks source link

Implicit phasing #5

Open wnddl111 opened 2 months ago

wnddl111 commented 2 months ago

Hi

I'm using this tool and I noticed that it supports implicit phasing. Could you please clarify how implicit phasing differs from the standard (explicit) phasing? Additionally, is it possible for me to obtain the phased values using this tool, and if so, how?

Thank you!

gqi commented 2 months ago

The implicit phasing uses EM algorithm to infer the phase between the exonic SNP where ASE is measured and the causal SNP driving the ASE (possibly outside the gene). It serves to improve differential ASE analysis, but not general purposes. The phasing results is in wt of the output list. wt encodes the posterior probabilities for each individual to be classified into cluster 1 (first column) or cluster 2 (second column), corresponding to two haplotype combinations.

wnddl111 commented 2 months ago

Hello,

Thank you very much for your prompt response.

I am working with DAESC and am interested in obtaining a matrix divided into paternal/maternal SNPs for heterozygous SNPs within the germline. From my understanding, phasing is essential for calculating allele-specific expression (ASE). Is this calculation available within the tool, or is there another method used that I might be unaware of?

Furthermore, I noticed in the documentation that it mentions: “Since phased genotype data are needed to aggregate SNP-level ASE counts to gene-level ASE counts, we impute and phase the genotype data using the Michigan Imputation Server with the Haplotype Reference Consortium (HRC) r1.1 data as the reference panel. For each individual and each gene, we sum the ASE counts across all SNPs within the exonic regions of the gene for each haplotype and obtain two haplotype-specific counts (hap1 count and hap2 count). Coordinates of exonic regions are provided by GTEx v736 annotation files (hg19) based on a collapsed gene model. After removing the genes which had non-zero ASE counts in some of the cells, we obtain ASE counts for 4102 genes and 30,474 cells.”

Does this imply that reference phasing information is only used when aggregating ASE to the gene level and not at the SNP level? Could you clarify when reference phasing information is utilized versus when it is not?

Thank you very much for your assistance.

Best regards, Juyoung lee