Hi
I am using big_spLogReg. I run a logistic model on a small set of 85 SNPs. In every run of the model and testing it on the test data with predict() and then AUC() I get quite a different value. I also tested what variables are chosen in every run, running the model 10 times (kept those with OR above 1.01 and below 0.99). I get a few SNPs that repeatedly selected (all the 10 times) and others with distribution of selection prevalence. What is the strategy to go ahead here? Can I set a criteria on the repeatedly selected SNPs? If I want to get a rough estimation of the classification persormance, should I just run it several time and avarage the AUC with its SE?
The implementation provided is not suitable for running with only a few variables (SNPs); it's more for 100s of 1000s of variables. You should probably directly run {glmnet} instead, or even just standard logistic regression (without penalization).
LASSO is known to be not particularly stable in its selection of variables.
Have a look at the {glmnet} vignette to find more suitable models; but discussing variable selection is out-of-scope of the help I provide here, sorry.
Hi I am using big_spLogReg. I run a logistic model on a small set of 85 SNPs. In every run of the model and testing it on the test data with predict() and then AUC() I get quite a different value. I also tested what variables are chosen in every run, running the model 10 times (kept those with OR above 1.01 and below 0.99). I get a few SNPs that repeatedly selected (all the 10 times) and others with distribution of selection prevalence. What is the strategy to go ahead here? Can I set a criteria on the repeatedly selected SNPs? If I want to get a rough estimation of the classification persormance, should I just run it several time and avarage the AUC with its SE?
Thank you for your help.