Estimation of model performance

privefl / bigstatsr

R package for statistical tools with big matrices stored on disk.

179 stars 30 forks source link

Hi I am using big_spLogReg. I run a logistic model on a small set of 85 SNPs. In every run of the model and testing it on the test data with predict() and then AUC() I get quite a different value. I also tested what variables are chosen in every run, running the model 10 times (kept those with OR above 1.01 and below 0.99). I get a few SNPs that repeatedly selected (all the 10 times) and others with distribution of selection prevalence. What is the strategy to go ahead here? Can I set a criteria on the repeatedly selected SNPs? If I want to get a rough estimation of the classification persormance, should I just run it several time and avarage the AUC with its SE?

Thank you for your help.

privefl / bigstatsr

Estimation of model performance #163