yuanzhongshang / GIFT

GNU General Public License v3.0
16 stars 1 forks source link

Input of GIFT for summary statistics #8

Open hxh0928 opened 1 month ago

hxh0928 commented 1 month ago

According to the PX-EM algorithm derived in the supplementary note of the GIFT paper, for the i-th gene in a given region with k genes in total, we only need the z-scores of its cis-SNPs. This implies that the z-scores of other genes' cis-SNPs on the i-th gene are not necessary.

Why the GIFT software requires an input of the Z_X matrix of all cis-SNPs on k genes in the region? In this matrix, the i-th column represents the z-scores of all cis-SNPs on the i-th gene, including both the cis-SNPs of the i-th gene and the cis-SNPs of other genes in the region.

yuanzhongshang commented 1 month ago

Thanks for your interest in the GIFT paper. In the PX-EM algorithm for the summary statistic version, the information of the z-scores of other genes' cis-SNPs on the i-th gene are reflected in the \mu_beta. In the software, to avoid the calculation of the inverse of the density matrix (e.g. \Omega^*) and reduce the computational burden, we directly replace the terms in the individual data GIFT model with the corresponding summary statistic. In addition, by doing so, we can ensure the results from the summary data and individual level data are same, once the summary statistics are indeed derived the individual level data. We speculate you may only have the z scores of the cis-SNPs per gene, if so, you can assume the cis-SNPs of other genes have no genetic effects on the i-the gene and run GIFT method by assigning the z-scores of other genes' cis-SNPs on the i-th gene to be zero.

Best, Zhongshang