Closed jpaillard closed 1 month ago
Attention: Patch coverage is 99.06250%
with 3 lines
in your changes missing coverage. Please review.
Project coverage is 76.35%. Comparing base (
b42572e
) to head (3ffa1b5
). Report is 27 commits behind head on main.
Files with missing lines | Patch % | Lines |
---|---|---|
hidimstat/cpi.py | 97.50% | 2 Missing :warning: |
hidimstat/loco.py | 98.36% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I tried to address the different comments:
Also linked to #17
After discussing with @AngelReyero we agreed that the API suggested in the current PR would be more convenient for the reasons mentioned above but also to facilitate the implementation of the recent contribution of Angel's work.
Separating the .predict
and .score
for example allows to modify CPI for averaging either at the prediction or loss level with minimal code changes.
The coverage decrease is related to #18, we are no longer testing Dnn and ModifiedRF which were part of the previous integrated in the previous BBI. Not sure if I should address it in this PR or a subsequent one Otherwise, this PR is ready for review @bthirion
Leave me a bit of time, it is a big one...
@achamma723 your opinion is welcome.
I tried to address all your comments @bthirion For me it is ready to merge.
Hello @bthirion and @jpaillard, sorry I weren't able to see all the comments in the past week. I'll try this weekend to have a look if you didn't decide to merge yet
Hello @jpaillard and @bthirion, I highlighted minor comments when passing, overall I think it is a great stable step for the future benchmarks. As for the cpi, now it is limited to the idea of reconstructing the variable or group of interest by the mean of the residuals, thus maybe the idea of the sampling (that existed via the Modified RF) is also interesting to push in the next steps?
Thx for the comments @achamma723. I made the modifications you suggested. Indeed it could be worth adding the sampling from nodes of the RF. Maybe we could tackle this in a following PR ?
Ready to merge on my side @bthirion
I added the additional tests and docstring @bthirion
I suggest a refactoring of the CPI functionality. It is inspired by the current implementation of permutation importance in scikit-learn. It aims at:
I also updated the example using the diabetes dataset.
Linked to issues #12 and #13