Including Selected Gene Weights in One Genome Region for the Two-Stage Version of GIFT

yuanzhongshang / GIFT

GNU General Public License v3.0

16 stars 1 forks source link

Including Selected Gene Weights in One Genome Region for the Two-Stage Version of GIFT #5

Closed 1667857557 closed 4 months ago

1667857557 commented 4 months ago

Hi Yuan,

We are currently running GIFT using the FUSION model weight. Should we include all gene weights in one genome region for the Two-stage version of GIFT, or is it possible to select and include some previously identified genes? Furthermore, FUSION typically suggests only using TWAS models for genes with a statistically significant SNP-based heritability estimate (p < 0.05). This criterion avoids the inclusion of genes for which TWAS models would be unlikely to provide accurate prediction, thereby reducing the multiple testing burden and limiting false positives from null TWAS models. it estimates the SNP-based heritability of gene expression using genome-based restricted maximum likelihood (GREML) and individual-level genotype and gene expression data. Can we include only genes with statistically significant SNP-based heritability estimates (p < 0.01) in GIFT? Thanks for your reply in advance!

Yu-Feng Huang

yuanzhongshang commented 4 months ago

Hi Yu-Feng,

Thank you for your continued attention for GIFT. In my opinion, both strategies are fine. For MA-FOCUS method, it performs TWAS fine-mapping using only genes with significant heritability. While, other TWAS fine-mapping methods do not require this constraint. There is a paper explicitly stating that there is no need to select genes with significant heritability estimates in TWAS fine-mapping. However, I am unable to recall the title of this paper at the moment. Perhaps you could conduct a search to retrieve it.

Please let me know if you have any other questions.

Best, Zhongshang

1667857557 commented 4 months ago

Hi Yuan,

Thank you for your reply. Before the Two-stage version of GIFT, we used another method to generate some candidate gene sets. In each genome region, should we include all available gene weights, or is it possible to select some previously identified genes to include? The latter will reduce the computational burden and speed up the process.

Best, Yu-Feng Huang

yuanzhongshang commented 4 months ago

Hi Yu-Feng,

Perhaps the optimal approach would include all available gene weights in each region to account for the relationship among all genes. Only focusing on the previously identified genes may be ad hoc. If you believe that the previously identified gene includes the truly causal gene, you could proceed with those genes. However, intuitively, this approach may yield false positives.

Best, Zhongshang

1667857557 commented 4 months ago

Hi Zhongshang,

Thanks for the help! This has solved my long standing problem.

Best, Yu-Feng