yuanzhongshang / PMR

22 stars 7 forks source link

Input file specifications #3

Open NLlonga opened 3 years ago

NLlonga commented 3 years ago

Hi Dr. Yuan,

I would like to use the PMR method in my analysis. Trying to use the following function with my data I got an error: "Error Chol () decomposition failed"

PMR_summary_Egger(Zscore_1, Zscore_2, LDmatrix_1, LDmatrix_2, samplen1=n1, samplen2=n2, lambda = 0, max_iterin = 1000, epsin = 1e-5, Heritability_geneexpression_threshold = 1e-04)

I have added the zscore_1 and zscore_2 from eQTL results and our GWAS results respectively. I have estimated the LD matrix using the 1000G, phase 3 reference panel.

I manage to avoid this error by removing highly correlated SNPs through clumping, the issue is that I need to use different r2 thresholds depending on the gene (XX-XX). I was wondering whether you may have any recommendation on how to proceed.

Thanks!

Natalia

yuanzhongshang commented 3 years ago

Hi Natalia, Thanks for your interest in our PMR method. The possible solutions for your issue are listed as follows:

  1. If you have individual level data, you’d better use the LDmatrix_1 and LDmatrix_2 calculated from the individual level data.
  2. For each specific gene, you can check the cis-SNP genotype matrix. For example, remove those SNPs that have the same values across all individuals from 1000 genome reference panel.
  3. You can increase the value of lambda, e.g. lambda=0.1,0.15,0.25,or 0.5
  4. The shrinkage method for LDmatrix is from PDSCE package in PMR, you can try other shrinkage methods to obtain the LDmatrix as inputs. Thanks again for your interest, please do not hesitate to contact me if you have any further questions. Bests, Zhongshang Yuan
NLlonga commented 3 years ago

Hello Dr. Yuan,

Thanks for your advice on possible solutions. Regarding the data, I only have summary data. Despite trying different datasets and strategies the method does not seem to work in most instances, with the error: "Error chol() decomposition failed".

I removed those SNP with MAF<0.05 and those highly correlated SNPs with PLINK software by clumping and using different values of r2 (0.2, 0.5, 0.7, and 0.9). Also, I have tried the different lambda values you suggested to me (0, 0.1, 0.2, 0.25). In addition, I’ve tried to perform the same procedure but without clump step.

To check if the problem is about zscore, I have tried Z scores from two different eQTL databases (eQTLgen consortium and GTEx), but when running PMR I received the same error.

For GTEx data I have tested 50 genes. With r2 = 0.2 (clump step) 32% of the genes have not given me the error. However with r2 = 0.9, for all genes have given me the error. For the eQTLgen consortium data I have tested 110 genes. With r2 = 0.2, 10% of the genes have not given me the error, however with r2 = 0.9, only 0.91% of the genes have not given me the error.

I wonder whether the method is better suited for individual level data, and despite all its theoretical advantages, in practice it does not work very well with summary data, may that be the case? I would really like to use this method and would really appreciate your insight.

Thank you for your time,

Natalia

yuanzhongshang commented 3 years ago

Hello Natalia,

Thanks for the email, it is probably the issue regarding the LDmatrix estimation, PMR-Egger relies on the likelihood framework to improve the TWAS power.

Thus, it requires the estimations of the two LDmatrix (one for eQTL data, another for GWAS data) are accurate, thus it should be unbiased using individual level data.

For the summary data, we have to input these two LDmatrix from the same reference panel data.

Given that I cannot have access to your data, I suggest you may try larger lambda values (e.g. 0.5,0.75) and try LD clumping with smaller r2 to be 0.1.

If possible, we can using the network meeting to discuss your data and how to implement PMR-Egger, you can directly contact me using email yuanzhongshang@sdu.edu.cn, then

we can dicuss more.

Thanks again for your interest in PMR-Egger.

Bests,

Zhongshang Yuan

0731Karan commented 2 years ago

Hello Dr. Yuan, I had the same problem, that is, I got an error:"Error Chol () decomposition failed". I tried a larger lambda value and a smaller r2 value as you suggested. But I still got it wrong. If I ignore these mistakes and set the corresponding p value as missing, the subsequent QQ plot of p value is far away from the diagonal, but the calculated genomic inflation factor is far less than 1. Could you give me some advice? In addition, I encountered some difficulties in using summary data for analysis.

  1. I have some confusion on how to select instruments. Your article mentioned using all the cis-SNPs that are within either 100 kb upstream as instruments. However, according to the Mendelian randomization, the SNP whose p value of Zscore_1 is less than 5e-8 should be selected as instruments.

  2. Considering that LDmatrix_1and LDmatrix_2 need to refer to the same panel data, I directly use the individual data to calculate LD, so LDmatrix_1and LDmatrix_2 are the same. I don't know if it is appropriate.

  3. How to select an appropriate lambda value? I found that by changing the lambda value, I get a different p value.

I really want to use this method for data analysis, and I would appreciate it if you could give me some suggestions. Thank you for your time, Karan

yuanzhongshang commented 2 years ago

Hello Karan, Thanks for your interest in the PMR-Egger method. PMR-Egger is more efficient than other TWAS methods due to its ability in adjusting the horizontal pleiotropy and the likelihood-based inference. Thus, PMR-Egger requires the input of LDmatrix_1 in eQTL data as well as LDmatrix_2 in GWAS data. The accuracy of these two matrix is quite important for the performance of PMR-Egger. Unfortunately, these two matrices are often missed in summary data, and we have to use the reference panel to estimate them first, then treat the matrix estimator as the inputs. The error "Error Chol () decomposition failed" may be likely due to the estimate of LD matrix is not so accurate. We partly alleviate this issue using the PDSCE package with shrinkage parameter lambda. In real data analysis, you can choose lambda from small to large, to ensure the software can go smoothly while keep the information as much as possible. If you have many lambda values that the software can run without any error, you may choose the smallest one. The responses to your detailed 3 questions are listed below. (1)I have some confusion on how to select instruments. Your article mentioned using all the cis-SNPs that are within either 100 kb upstream as instruments. However, according to the Mendelian randomization, the SNP whose p value of Zscore_1 is less than 5e-8 should be selected as instruments. TWAS preferred the cis-SNPs for each specific gene so that we can investigate the regulatory mechanism by integrating the eQTL data and GWAS data. The inference procedure is essentially the two stage MR analysis per its algorithm. However, given the cis-SNPs within one gene are often in high LD and the sample size in current eQTL data is often small, it is hard to select enough instrumental SNPs under the p value threshold 5e-08. In the traditional MR analysis where the exposure variable is some traits with large sample size rather than gene expression in TWAS, the instrument selection is quite important, we recently developed another MR method, MRAID (https://www.science.org/doi/10.1126/sciadv.abl5744), to conduct the automatic instrumental selection, you can follow MRAID if you are performing a traditional MR analysis.
(2)Considering that LDmatrix_1and LDmatrix_2 need to refer to the same panel data, I directly use the individual data to calculate LD, so LDmatrix_1and LDmatrix_2 are the same. I don't know if it is appropriate. Again, if both the individual level data of eQTL and GWAS are missed, you need to estimate the two LD matrices, theoretically, you can use the same calculation from one reference panel data. (3)How to select an appropriate lambda value? I found that by changing the lambda value, I get a different p value. Please see our responses above regarding the choice of lambda. Thanks again for your interest. Please do not hesitate to contact me if you have any further questions.

Bests, Zhongshang Yuan

0731Karan commented 2 years ago

Hello Dr. Yuan, Thank you very much for your advice on TWAS last time, which helped me a lot. I saw another MR method , MRAID, you recommended before, but I have some doubts that urgently need your help to solve. If you are willing to help solve this problem, I would be very grateful. Does TWAS need to consider correlated horizontal pleiotropic effects? Can MRAID be applied in the context of TWAS?

Thank you for your time, Karan

yuanzhongshang commented 2 years ago

Hello Karan,  Thanks for your email. Correlated pleiotropy should be also considered in TWAS if we put TWAS into the MR framework. The main feature of MRAID is that we have incorporated variable selection for the candidate correlated instrumental SNPs and the adjustment of two types of pleiotropic effects. Indeed,we have once tried the MRAID procedure to the TWAS setting, unfortunately, the statistical properties can not be ensured, e.g. the qqplot for type I error are quiite deflated, which, we think, may be due to that the small sample size of the eQTL study in TWAS as well as the high LD correlation between cis-SNPs within each gene.In the traditional MR framwork, the sample size of the exposure GWAS are often quite large.  Thanks again for your email and interest in our method. please do not hesitate to contact us if you have further questions.

Bests, Zhongshang