When the target y is a pd.Series rather than a numpy array, and the
pd.Series has a numeric index that isn't consecutive, previously
GroupedPipeline will attempt to select targets based on the index. In
the newer version of pandas, any missing indices will also raise an
Exception.
This is because we selected from y assuming it's a numpy array:
y[indices]. If y has a consecutive index, it makes no difference. If y
has a non-numeric index, pandas falls back to selecting via integer
location like numpy. However, if the index for y is numeric and
non-consecutive, the pd.Index is incorrectly used rather than the
position, and an Exception can be raised if indices are missing.
This change adds a test for the bug and updates the _iter_groups
method to always select the correct targets regardless of their type
(pd.Series or np.array) or their index.
When the target
y
is a pd.Series rather than a numpy array, and the pd.Series has a numeric index that isn't consecutive, previously GroupedPipeline will attempt to select targets based on the index. In the newer version of pandas, any missing indices will also raise an Exception.This is because we selected from
y
assuming it's a numpy array:y[indices]
. If y has a consecutive index, it makes no difference. If y has a non-numeric index, pandas falls back to selecting via integer location like numpy. However, if the index for y is numeric and non-consecutive, the pd.Index is incorrectly used rather than the position, and an Exception can be raised if indices are missing.This change adds a test for the bug and updates the
_iter_groups
method to always select the correct targets regardless of their type (pd.Series or np.array) or their index.