Features, or documentation
Part of sklearn handles monotonic constraints and n_constant_features tracking. From first glance it is also not clear that these are actually tracked. I.e. is the monotonic constraint and n_constant_features if we do fit and then partial_fit for two subsets of the data different from if we just did fit for the entire dataset? If they are different, what does this imply?
Some segfaults arose in scikit-tree during the implementation of this PR: https://github.com/neurodata/scikit-tree/pull/249. Some actionable items came to mind to improve what we have:
Documentation: https://github.com/neurodata/scikit-learn/blob/f3607494ce509f1ff63255b0cb3bc5562de981a1/sklearn/tree/_tree.pyx#L211-L273 should ideally get rewritten, or at least more comments. Rn it is hard to parse what comprises
initial_roots
. Since this is part of the Cython codebase, it is thus a critical piece as segfaults are time-consuming and difficult to chase down.Next, we prolly want to include a clear description for developers on what the differences are here: https://github.com/neurodata/scikit-learn/blob/f3607494ce509f1ff63255b0cb3bc5562de981a1/sklearn/tree/_tree.pyx#L302-L320
Features, or documentation Part of sklearn handles monotonic constraints and n_constant_features tracking. From first glance it is also not clear that these are actually tracked. I.e. is the monotonic constraint and n_constant_features if we do
fit
and thenpartial_fit
for two subsets of the data different from if we just didfit
for the entire dataset? If they are different, what does this imply?In an ideal world the state is the same.