jean997 / cause

R package for CAUSE
https://jean997.github.io/cause/
52 stars 15 forks source link

LD pruning question #6

Closed gnarw closed 4 years ago

gnarw commented 4 years ago

Hi. I'm following the example analysis for LDL -> CAD I have run the ld_prune function. ..pruned <- ld_prune(....) The pruned variable the ld_prune function returns is a list of numbers. I'm a little confused. Thought the pruned variable should be a list of rsID, i.e. SNP names. Or is the pruned variable a list of row numbers for the ld data frame (i.e. chr22_AF0.05_0.1.RDS) and if so which column would then contain the pruned variants ?

jean997 commented 4 years ago

Hi there, You are correct, ld_prune returns a vector of SNP names. The variants argument of ld_prune can be either a data frame or a vector of names. If variants is a vector, ld_prune will simply return a subset of that vector pruned according to the other arguments. If variants is a data frame you need to tell the function which columns contain the SNP name and (optionally) which columns contain a p-value that you want to prune based on using the variant_name and pval_cols arguments. So if in your data frame, the rs-number column is called rsID you should set variant_name="rsID". Let me know if this helps. If not maybe you can post a small data set that replicates your issue.

gnarw commented 4 years ago

I'm running this for chr22. variants is a data frame where the rs-numbers are in the "snp" column and p values are in the "pval1" column. ld = readRDS("~/Documents/Work/CAUSE/chr22_AF0.05_0.1.RDS") snp_info = readRDS("~/Documents/Work/CAUSE/chr22_AF0.05_snpdata.RDS") pruned_chr22 = ld_prune(variants = variants, variant_name = c("snp"), ld = ld, total_ld_variants =snp_info$SNP,pval_cols = c("pval1"), pval_thresh = c(1e-3))

When I run this command I get You have suppplied information for 1039761 variants. Of these, 14640 have LD information.

pruned_chr22 [1] 328464 569370 569278 1115085 413676 327461 522071 1179652 856219 586252 855429

I don't get the rs-numbers. Do you see what could be the problem here ?

jean997 commented 4 years ago

Could you be using an old package version? It did use to return indexes but I think I modified that over a year ago. Can you check sessionInfo()? I am assuming this is the data from the vignette?

gnarw commented 4 years ago

I installed CAUSE using devtools::install_github("jean997/cause") I'm following the example on https://jean997.github.io/cause/ldl_cad.html but I'm using a different set of GWAS data.

sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] cause_1.0.0.0267 dplyr_1.0.0      readr_1.3.1     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6        pillar_1.4.4        compiler_3.6.3      tools_3.6.3         lifecycle_0.2.0     tibble_3.0.1        gtable_0.3.0       
 [8] lattice_0.20-38     pkgconfig_2.0.3     rlang_0.4.6         Matrix_1.2-18       rstudioapi_0.11     parallel_3.6.3      loo_2.2.0          
[15] gridExtra_2.3       invgamma_1.1        generics_0.0.2      vctrs_0.3.1         hms_0.5.3           grid_3.6.3          tidyselect_1.1.0   
[22] glue_1.4.1          R6_2.4.1            mixsqp_0.3-43       irlba_2.3.3         purrr_0.3.4         ggplot2_3.3.2       tidyr_1.1.0        
[29] ashr_2.2-47         magrittr_1.5        matrixStats_0.56.0  scales_1.1.1        ellipsis_0.3.1      intervals_0.15.2    colorspace_1.4-1   
[36] numDeriv_2016.8-1.1 RcppParallel_5.0.2  munsell_0.5.0       truncnorm_1.0-8     SQUAREM_2020.3      crayon_1.3.4       
> 
jean997 commented 4 years ago

Hmm. I can only think that it could be an issue with the data. Double check the variants data frame with head maybe and then if you can post a small example that replicates the problem I can look into it.

gnarw commented 4 years ago

The "snp" column of the variants data frame had the rs-numbers stored as factors. Changed the column class to character. Now the ld_prune function return the rs-numbers. Thank you for the help.