jean997 / cause

R package for CAUSE
https://jean997.github.io/cause/
52 stars 15 forks source link

LD pruning by using Plink #17

Closed mingyang0202 closed 3 years ago

mingyang0202 commented 3 years ago

Hi Jean,

Thanks for providing an amazing MR method!

Referred to the example you provided, I am currently trying to analyze my own data. But I have a question in LD pruning by using Plink.

Step 3: X_clump <- X %>% rename(rsid = snp, pval = p_value) %>% ieugwasr::ld_clump(dat = ., clump_r2 = r2_thresh, clump_p = pval_thresh, plink_bin = genetics.binaRies::get_plink_binary(), bfile = ref_path)

Which variables should be referred to to set clump_r2 (r2_thresh) and clump_p (pval_thresh)? Should I perform it with the default values (clump_r2=0.001, clump_p=1) or do it with clump_r2=0.1 as well as clump_p=1e-3?

jean997 commented 3 years ago

Hi -- there is no set rule on what those thresholds but I can explain what each parameter means.

r2_thresh: The lower this is the more conservative the pruning is, so you will end up with fewer SNPs. If the threshold is too high, some of the SNPs will be in LD with each other which could create problems. The default that I use in the pipeline (https://jean997.github.io/cause/pipeline.html) is 0.01. This is also what we used throughout our paper. Ideally your results should not be too sensitive to this value (though 0.1 might be too high).

pval_thresh: You should use whatever threshold you are going to use later for fitting the CAUSE posteriors (we use 1e-3 by default). Looking at the tutorial, I think I left out a detail or it is unclear that the p-value you should use for LD pruning should be the trait 1 p-value. I will update this.

jean997 commented 3 years ago

Ok FYI I updated the tutorial a little bit in the LD pruning section. I also updated the gwas_merge function so it will include p-values for you which smooths out the workflow a little bit. If you want that you can reinstall the latest version from github.