Closed schumaki closed 10 months ago
I've had the same problem... No (nonbinay) factor importances were calculated. Will there be an update? Thanks!!
For a (possible) fix, please check my fork: https://github.com/schumaki/pre/commit/d0027e6316f6d91c28b84b3e6b1b44956ac5f92f
If I am not mistaken, it is sufficient to move a bracket to another line.
Thanks for noticing and contributing! Package pre has been updated here on github, and is on its way to CRAN.
Importances for factors are now correctly computed (factors tend to be picked up by the tree structure, to enforce picking up factors as linear terms I needed to use type = "linear"
):
library("devtools")
install_github("marjoleinF/pre")
library("pre")
airq$Month <- factor(airq$Month)
set.seed(42)
airq.ens <- pre(Ozone ~ Month, data = airq, type = "linear")
imp <- importance(airq.ens)
imp$baseimps
## rule description imp coefficient sd
## 1 Month7 Month7 3.031066 7.124545 0.4254400
## 2 Month8 Month8 2.577298 6.330198 0.4071434
imp$varimps
## varname imp
## 1 Month 5.608365
Thank you very much for fixing the issue!
In my dataset, the variable importances are now calculated correctly (even picking up some factors as linear terms without type = "linear"
).
The update is on CRAN now, too. Thanks very much, @schumaki, for reporting the problem and its source, and double checking the update!
I have identified a potential issue in the calculation of variable importances within the
importance.pre()
function in the source code here.Description: It appears that the section responsible for obtaining importances for factors is referencing
varimps$varname[i]
outside the for loop, thus only capturing the last variable name when calculating factor importances.This leads to missing importances for factor variables in the returned
imps$varimps
. Because otherwise only variables with zero importance are missing inimps$varimps
, this might create a potentially misleading impression that all factor variables are omitted in the final model, while, in fact, their importances are not accurately captured.Here is the relevant code snippet:
Expected Behavior: Variable importances should include all variables of the final model, including factors.
Proposed Solution: Move the factor importance calculation section inside the loop to properly scope factor variables.