junyangq / snpnet

snpnet: Fast and scalable lasso/elastic-net solver for large SNP data
32 stars 15 forks source link

UKB pipeline #26

Open mxcai opened 5 years ago

mxcai commented 5 years ago

I am using the snpnet software to handle the UK biobank BMI data. It seems stuck at the step "Extracting number of variants and colnames from train.bim" for a long while. And there is a large number of parameters to specify in the function. Could you provide the code for running the UKB BMI data in the manuscript for replication? Thanks a lot.

yihchii commented 4 years ago

Hi @junyangq and team! We have a similar question as this one. Can you give us comment on if we are using the snpnet function correctly?

Here is the parameters we used when running snpnet with UKB data. Also, for @mxcai , if you want to compare answers:

configs <- list(bufferSize = opt$800, nCores = 96, meta.dir = 'meta.dir')

results <- snpnet(genotype.dir = PATH_TO_PLINK_FILES,
                  covariates = c('age','sex', paste0('pc', 1:10)), 
                  configs = configs,
                  niter = 101,
                  num.snps.batch = 8000,
                  results.dir = 'results',
                  phenotype.file = PATH_TO_PHENOTYPE_TABLE,
                  use.glmnetPlus = T, KKT.verbose = T, verbose = T,
                  phenotype = 'bmi', save = T)

We can see from the usage of the computer that it is not hanging and we set the verbose parameters to True to read all status update.