sktime / sktime

A unified framework for machine learning with time series
https://www.sktime.net
BSD 3-Clause "New" or "Revised" License
7.84k stars 1.35k forks source link

[ENH] Inconsistent handling of _steps_attr in HeterogenousMetaEstimator inheritors #7222

Open mateuszkasprowicz opened 1 week ago

mateuszkasprowicz commented 1 week ago

Describe the bug

The bug involves inconsistent handling of the _steps_attr attribute in several classes inheriting from _HeterogenousMetaEstimator. Specifically, the _steps_attr attribute is either missing or incorrectly formatted in these classes, causing issues with parameter management, particularly for visualizing (html repr). There are two main cases:

  1. Pipelines with a single estimator and transformer(s) (e.g., ParamFitterPipeline, PwTrafoPanelPipeline). These classes inherit _steps_attr, but it points to _steps, which does not exist. They require a new _steps attribute that combines transformers and the estimator.

  2. Classes where _steps_attr points to an incorrectly formatted list of estimators (e.g., GroupbyCategoryForecaster, FeatureUnion). These need _steps_attr to point to an iterable of (name: str, estimator, ...) tuples instead of the current format.

Additional context

The issue was identified while investigating: https://github.com/sktime/sktime/issues/7152

For a detailed list of impacted classes refer to the following comment: https://github.com/sktime/sktime/issues/7152#issuecomment-2392006316

Potential solution: https://github.com/sktime/sktime/issues/7152#issuecomment-2392289976

fkiraly commented 1 week ago

I think it's less of a bug than a potential enhancment to the base class interfaces - thanks for opening the issue and summarizing!

mateuszkasprowicz commented 1 week ago

Agreed @fkiraly. Just another note, probably for the future it would make sense to add a check for all estimators with a _steps_attr attribute to see if a corresponding property exists.

fkiraly commented 1 week ago

yes, agreed - we should make the extension API more strict. Perhaps extra checks for all heterogeneous meta-estimator descendants?