Closed aazuspan closed 1 year ago
Not sure HOW you came across this bug, but great explanation of the issue. I like your solution because we really only care about X
's index rather than checking whether X
is a dataframe. Checking that it's array-like makes sense to me.
You haven't read every scikit-learn
issue?! Just kidding, I got lucky when their latest changelog showed up in my Github feed 😉
Sounds like a plan!
Resolved by #56
I just came across scikit-learn/scikit-learn#27037 that points out that using
hasattr
orgetattr
to retrieve a DataFrameindex
will unintentionally retrieve thelist.index
method. For us, this means that an estimator fitted with alist
will store that method asdataframe_index_in_
. Then, if you usedreturn_dataframe_index=True
withkneighbors
, you'd get a weirdIndexError
.This is the offending line:
https://github.com/lemma-osu/scikit-learn-knn-regression/blob/96a4230cbbd113136aa619f869afabe1d648e054/src/sknnr/_base.py#L14-L15
They solve this in scikit-learn/scikit-learn#27044 by using a new
_is_pandas_df
helper, but as far I can tell that's not released yet, so I think it'll be a while before we want to rely on that.We could add a similar function to
sknnr
, but I wonder if we might break some implicit duck-typed support forDataFrame
-like objects by adding an explicit class check here. An alternative would just be to check thatindex
is an array using_is_arraylike
, which would prevent this bug and should cover us for just about any other case.