rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
173 stars 52 forks source link

New feature (?) -- variable importance #31

Open tdelhomme opened 3 years ago

tdelhomme commented 3 years ago

Hi all,

First: very cool method! And seems highly efficient. What I wanted to ask is not really an issue but more like an "enhancement" or just a question. I was wondering if you provide a method for computing the feature importance of each input SNP in your stacking model? I did not find anything about this in the preprint and was wondering if you have any idea about how does this can be done.

Thanks in advance,

Tiffany

joellembatchou commented 3 years ago

Hi Tiffany,

Thank you for your interest in the software. We do not currently compute a feature importance measure for the variants going into step 1 of Regenie. As the level 0 models of Regenie are computing linear combination of variants under different shrinkage factors, the set of weights assigned to each variant in the linear combinations could be used to build a measure of importance for each variant within a block. This is not an enhancement we have planned for upcoming releases but we will add it to the list of features to add to Regenie.

Kind regards, Joelle

tdelhomme commented 3 years ago

Thanks for you answer Joelle. I think the difficulty here is that you will have J variable importance values for each block in level-0 model and then 1 variable importance for each new predictor in the superlearner. I don't have any idea how one can aggregate these 2 levels values... If you manage to implement it, please let me know!

Best regards,

Tiffany