yuanzhongshang / GIFT

GNU General Public License v3.0
16 stars 1 forks source link

How to get other input data when using summary statistics as input? #4

Open HackerLZH opened 5 months ago

HackerLZH commented 5 months ago

Thanks for sharing such an excellent tool ! But I have problems that where or how to get genome-wide eQTL summary statistics(the example includes eQTL summary statistics for each gene), and how to specify the snplist and pindex. I found there are duplicated snps in the example snplist, and the rows of snplist is the sum of pindex and rows of LD matrix . After removing dupicates, the rest snps are the same as snps in gwas summary statistics and eQTL summary statistics. Why? I don't know. And in the end, where or how to get the estimated correlated matrix of gene expressions? Looking forward to your reply.

yuanzhongshang commented 5 months ago

Hi,

Thank you for your attention. I didn't realize the availability issue with genome-wide eQTL summary statistics until you reminded me. Although one always assumes that there is individual-level eQTL data available in the TWAS analysis, I will provide the genome-wide eQTL summary statistics from GEUVADIS data as soon as possible and let you know.

snplist is the list containing all the SNPs for each gene, and pindex represents the number of SNPs used to indicate the corresponding cis-SNPs for each gene. For example, if we have two genes in the focal region, and each gene contains 10 SNPs with 5 SNPs overlapped, snplist would be a 20-vector instead of a 15-vector with unique SNPs. pindex would be a 2-vector with each element being 10. The corresponding LD matrix would also be a 20*20 matrix.

The correlation matrix of gene expressions is equal to the sample correlation among marginal z-score vectors for the null SNPs associated with all the gene in a specific region. Briefly, you can extract all the SNPs for genes in the region from the genome-wide eQTL summary statistics, filter the SNPs with the p-value<1e-5 for at least one gene to retain the completely null SNPs, and calculate the correlation of z-scores among genes.

Please let me know if you have any questions and how it goes!

Best, Zhongshang

yuanzhongshang commented 4 months ago

Hi,

I tried to upload the genome-wide eQTL summary statistics, but the files are large. So, it will take a relatively long time to upload completely. Please use this link to simultaneously download the files I uploaded.

Best, Zhongshang

HackerLZH commented 4 months ago

Thanks, Zhongshan

HackerLZH commented 4 months ago

Hi,

I tried to upload the genome-wide eQTL summary statistics, but the files are large. So, it will take a relatively long time to upload completely. Please use this link to simultaneously download the files I uploaded.

Best, Zhongshang

The data in dropbox can't be downloaded directly into local server using wget, and it has size limitation. Is there an easier way to download all data at one time?

yuanzhongshang commented 4 months ago

Hi,

I apologize for missing your message. You might try to copy the link for each chromosome to bypass the size limitation. Hope it works. If that doesn't work, I'll explore alternative methods.

Best, Zhongshang