Open julian-fong opened 4 months ago
@fkiraly I've come into a problem with the current implementation for polars
support in skpro.
if an estimator specifies
"X_inner_mtype": "polars_eager_table",
"y_inner_mtype": "polars_eager_table",
Then during the tests, pandas DataFrames will get converted into polars dataframes via check_X
in the boilerplate code in regression.base
but they will lose their index
Since the index is already lost via the boilerplate code check_X
, it is not retrievable when calling the private methods (since the input is already in polars dataframe format without the index). This will then fail subsequent index asserts in test files after the DataFrame is converted back into a pandas DataFrame via the convert
function.
Interesting - I thought it saved the index as a variable __index__
if it was not a range index.
Or, is that only in the sktime
implementation by @pranavvp16 ?
I think that would be in the sktime
implementation, we do not save the index
anywhere currently in the boilerplate if the incoming mtype is in polars format
May I suggest to try syncing the two implementations? I think the sktime
type by @pranavvp16 stores non-range index as a reserved variable.
Implement the
DummyProbaRegressor
but with complete end to end support inskpro
.Some current limitations:
fit
insideDummyProbaRegressor
usesskpro.distributions
which only supports pandas dataframes - needs a workaroundpredict_proba
also usesskpro.distributions
- leading to the same issue, will need a workaround as well@fkiraly any suggestions on how to implement?