Closed adam2392 closed 4 months ago
Interestingly, this is not an issue on RandomForestClassifier, so I suspect there is a relationship to the empty leaves, or the fact that we use a separate dataset to estimate the posteriors
Attention: Patch coverage is 82.14286%
with 10 lines
in your changes missing coverage. Please review.
Project coverage is 78.55%. Comparing base (
b8da7b0
) to head (290c5f6
). Report is 1 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Changes proposed in this pull request:
build_oob_forest
will work with any sklearn Forest that hasestimators_samples_
(in-bag sample indices)Stratification should occur every time we sample the dataset whether its subsampling, or bootstrapping.
Summary
On
main
branch, using the following test:we get the error:
However, if we run it on this branch, we get
0.50498046875 [0.484375, 0.53076171875, 0.513671875, 0.46533203125, 0.53076171875]
, which shows the stratification fixes the bias.