nfj1380 / mrIML

Multivariate (multi-response) ensemble learning
https://nfj1380.github.io/mrIML/
Other
6 stars 3 forks source link

Top variable importance VI and partial dependence are not alleyways the same. #5

Open dosshra opened 1 year ago

dosshra commented 1 year ago

Hello I was following your vignette "Regression working example" with my own data. I was looking at one of my partial dependence plots (PD) and I wanted to find the order of importance in the selected SNP. However, looking at the VI object, some SNP that are among the top importance values are not represented in the PD plot while other SNP with a bit lower importance are represented. I wonder if this is expected. Which of the two estimates will better point to SNP involved in adaptation? Thank you Hanan

nfj1380 commented 1 year ago

Hi Hanan,

These are stochastic algorithms so you can get some variability across runs (that's why setting the starting seed is important). You tend to get more variability with smaller datasets like this one. Practically you can run the analysis multiple times and then get an average result. For example, I've written some new code that summarizes variable importance across runs with different seeds that will come out in the next version of MrIML, but I can provide it to you early if you are interested. I'll work on ALE plots that do the same thing in the future too.

Cheers,

Nick