Closed e10e3 closed 2 weeks ago
Thanks for reporting that @e10e3! I fixed this error in #1561.
Thank you for your quick response @smastelini!
When looking through the code, I found the test check_disappearing_features
, which seems to ensure such behaviour does not happen. Do you know why the tests did not find this issue?
Hi @e10e3 , I think the tests were designed to test the robustness of ARF when a few features disappear, but not when the number of missing features went below the bare minimum. It is indeed an interesting corner case to catch :)
River version: 0.21.1 Python version: 3.12.4 Operating system: MacOS 14.5
Describe the bug
When using an adaptive random forest (ARF), if the number of features in the input dictionary changes and goes below a threshold, the model crashes because of a sampling error.
This situation can happen if feature selection is used, or simply if the number of features changes in the data stream.
This crash happens because the maximum number of features to consider is set when the leaves are created. If the effective number of features changes subsequently, the leaf calls
random.sample()
with a sample size larger than the number of elements (which is illegal).Code to reproduce
Output: