jean997 / cause

R package for CAUSE
https://jean997.github.io/cause/
52 stars 15 forks source link

Different results in MRPRESSO using CAUSE pipeline #7

Closed vackground closed 4 years ago

vackground commented 4 years ago

Hello Jean, I have been concerned that the results of the MRPRESSO test vary greatly when I do it from the Twosamplemr or when I get them from the CAUSE pipeline. The analyses in Twosamplemr have been done directly using the MRCIEU repository and the raw analysis coincides with IVW: Here the results with Twosamplemr:

[[1]]$Main MR results Exposure MR Analysis Causal Estimate Sd T-stat P-value 1 beta.exposure Raw 1.366811 1.2328202 1.108687 0.287655440 2 beta.exposure Outlier-corrected 2.785822 0.7592963 3.668952 0.003696378

[[1]]$MR-PRESSO results [[1]]$MR-PRESSO results$Global Test [[1]]$MR-PRESSO results$Global Test$RSSobs [1] 61.3204

[[1]]$MR-PRESSO results$Global Test$Pvalue [1] "<0.001"

[[1]]$MR-PRESSO results$Outlier Test RSSobs Pvalue 1 8.753039e-04 0.098 2 3.345331e-04 1 3 4.670621e-05 1 4 4.391925e-04 0.826 5 3.459541e-03 <0.014 6 2.879363e-04 1 7 2.850324e-04 1 8 3.004769e-06 1 10 5.570340e-05 1 12 2.902729e-05 1 13 2.174724e-04 1 14 1.105334e-03 0.014 15 6.131958e-04 1 16 4.044130e-04 0.966

[[1]]$MR-PRESSO results$Distortion Test [[1]]$MR-PRESSO results$Distortion Test$Outliers Indices [1] 5 12

[[1]]$MR-PRESSO results$Distortion Test$Distortion Coefficient beta.exposure -50.93688

[[1]]$MR-PRESSO results$Distortion Test$Pvalue [1] 0.406

                 method nsnp        b        se        pval

1 MR Egger 14 3.797087 6.0002472 0.538722443 2 Weighted median 14 2.640671 0.9316723 0.004592115 3 Inverse variance weighted 14 1.366811 1.2328202 0.267565424 4 Simple mode 14 3.624970 1.4340917 0.025233802 5 Weighted mode 14 3.394672 1.4395643 0.034696453

However, even though they do not exactly match IVW results measured with MendelianRandomization from the pipeline it is closer to the raw results of MRPRESSO with the CAUSE pipeline:

$Main MR results Exposure MR Analysis Causal Estimate Sd T-stat P-value 1 beta_hat_1 Raw 3.045014 1.0477750 2.906171 0.0084458003 2 beta_hat_1 Outlier-corrected 2.827610 0.6106855 4.630223 0.0002779343

$MR-PRESSO results $MR-PRESSO results$Global Test $MR-PRESSO results$Global Test$RSSobs [1] 108.3815

$MR-PRESSO results$Global Test$Pvalue [1] "<0.001"

$MR-PRESSO results$Outlier Test RSSobs Pvalue 1 4.250303e-03 <0.022 2 2.374254e-04 1 3 2.736274e-05 1 4 1.652793e-03 0.022 5 5.958690e-05 1 6 1.408030e-04 1 7 1.202757e-03 0.044 8 1.906410e-03 <0.022 9 2.453591e-07 1 10 1.227881e-04 1 11 1.685381e-06 1 12 1.417235e-03 1 13 1.083542e-04 1 14 6.598389e-04 1 15 6.291629e-06 1 16 6.867198e-04 1 17 7.681963e-05 1 18 5.324216e-05 1 19 2.903083e-04 1 20 1.297428e-03 0.022 21 7.956443e-05 1 22 4.118400e-04 1

$MR-PRESSO results$Distortion Test $MR-PRESSO results$Distortion Test$Outliers Indices [1] 1 4 7 8 20

$MR-PRESSO results$Distortion Test$Distortion Coefficient beta_hat_1 7.688607

$MR-PRESSO results$Distortion Test$Pvalue [1] 0.648

1 IVW_RE_noNOME 2.78 0.945 0.00329
2 Egger_RE_NOME 2.88 5.82 0.620
3 Median_Wtd 4.01 0.814 0.000000825 4 MBE_Wtd_noNOME_phi1 3.89 1.37 0.00459
5 MBE_Wtd_noNOME_phi0.5 4.51 2.02 0.0258
6 MBE_Wtd_noNOME_phi0.25 4.59 2.22 0.0385

I had already noticed differences between the results of IVW, Egger, etc. between MendelianRandomization and Twosamplemr but the large differences in this analysis made me worry about the results in CAUSE. I have checked and these differences in MRPRESSO also appear in other traits in both raw and outlier-corrected with CAUSE pipeline.

My concern is that if there might be a problem with the CAUSE results or if it could be simply a variation due to differences in the harmonization in Twosamplemr and manipulation of the data in the repository.

Thank you in advance, Alvaro

jean997 commented 4 years ago

Hi Alvaro, The CAUSE pipeline code just provides a wrapper directly using the MRPRESSO R package. It is very short so it might be helpful to take a look. You can find it here

https://github.com/jean997/cause/blob/master/pipeline_code/R/mrpresso.R

I haven't done comparisons with TwoSampleMR which is also using a wrapper to the MRPRESSO R package but looking at that code I see a few small differences. There may be a difference in the NbDistribution argument which defaults to 1000 using TwoSampleMR. The CAUSE pipeline will use the largest of 1000 or 10 times the number of data points. There could also be differences in the data you are using (for example if you are using different p-value thresholds). There may also be a difference in harmonization depending on the options you used with TwoSampleMR. CAUSE drops all palindromic SNPs and doesn't try to guess the strand using allele frequencies which, based on the documentation should be like using strictness level 3 with TwoSampleMR. Otherwise the two wrappers appear to be the same to me. Perhaps you can check by comparing the data that is going into each one.

vackground commented 4 years ago

Hello again Jean, Thanks for the quick answer. In principle, I'm using the same 5e-8 thresholds. I didn't make any modifications in config.yaml analysis. I understand that the difference may be in the pre-processing of some of the analyzed traits. For these CAUSE analyses, I have used a summary statistics of one of the traits downloaded directly from PGC consortium. I am going to try to use CAUSE with both traits downloaded from MRCIEU and modify them from the VCF format to see if it has to do with any difference in the data processing in the MRCIEU repository.

Thank you very much!

Alvaro

jean997 commented 4 years ago

Ok sounds good -- we are actually working on adding functions for going from MRCIEU vcf to CAUSE format now. The other thing I would suggest is to compare the data used in the CAUSE pipeline vs in TwoSampleMR -- do they have the same SNPs? The other thing that occurred to me is that the CAUSE pipeline uses an LD pruned set of SNPs.TwoSampleMR also does this through the extract_instruments function but it may be that they choose different instruments. Looking at your output it seems clear that there are more variants going into the analysis that the CAUSE pipeline is using (22 vs 16). The default LD pruning threshold for TwoSampleMR is 0.001 while the CAUSE pipeline defaults to 0.1 so that could be the difference.

vackground commented 4 years ago

I just checked and the data has the same snps so the most laudable explanation is the one you point out. It must be the difference in the threshold of the LD pruning. Thanks again for your time. Kudos for the new MR method btw, amazing job! A