Closed fkiraly closed 6 months ago
The shapelets_
attribute is indeed a nested numpy.ndarray
because the lengths of the extracted shapelets is not fixed (several window sizes are considered). So it's an array of arrays, but it could have been a list of arrays.
I had a look at the documentation of ShapeletTransformClassifier, and I didn't find a public attribute for the extracted shapelets. I guess that the shapelets have to saved someway (or one can just save the indices to avoid making copies) to make predictions on unseen data. I know that the data is saved in another format in sktime
(pandas.DataFrame), which may be more suited to handle variable-length data.
One important current limitation of pyts
is that it does not support variable-length time series (except for DTW metrics and kNN with such metrics).
What kind of changes (either in pyts
or in sktime
) would make this estimator more compatible with sktime
API?
Thanks for the clarification. I'll try to answer to different points above.
The shapelets_
attribute issue is not an sktime
compatibility issue, as it is not violating contract assumptions.
The attribute s of ShapeletTransform
, it is named shapelets_
(not ShapeletTransformClassifier
).
Attributes ending in (as opposed to, starting with) underscore are considered public in the scikit-learn
interface, as they store fitted parameters.
I see, so you are saying nested array (so, array with arrays inside) is intended. Then I'm surprised that not all test scenarios fail, only some. In this case, the bug would be that in some cases the attribute ends up as a nested array, in some not?
sktime
compatibilityWhat kind of changes (either in
pyts
or insktime
) would make this estimator more compatible withsktime
API?
There are no major changes necessary, pyts
has a consistent interface that can easily be adapted in sktime
, and it is internally consistent. Let me know if I understand sth wrong below:
pyts
is not entirely sktime
compliant, but it is almost sklearn
compliant, because you inherit from sklearn
BaseEstimator
and represent your time series to be 2D numpy
which is an sklearn
container.
sklearn
compliance imo is systematic checking via sklearn
check_estimator
(this should be part of the tests). This is compliance with the strict interface assumptions for tabular (not time series) classification etc that sklearn
lays out. This way of compliance is possible as long as your time series are always univariate, equal length.sktime
compliance is that sktime
follows a different extension pattern (manages its own boilerplate, more flexible in the data representations), which pyts
is not following. The reliance on the sklearn
interface, though, prevents things like unequal length time series etc. The sktime
interface and the sklearn
interface are composition compaible, but not identical (because they are meant ot support different learning tasks).Happy to have a larger discussion thread about this, and potential actions, if you are interested. I'd need to understand goals, of course.
As discussed above: not a problem in pyts
but in sktime
checks. Fixed there.
We have started to interface
pyts
atsktime
due to popular demand.When running the test framework, we've detected inconsistencies in the fitted parameters of
ShapeletTransform
, namelyshapelets_
sporadically being nestednumpy
on our test cases.We haven't narrowed it down to "
sktime
-less" code, but the issue is probably inpyts
because the fitted parameters are mirrored 1:1.Minimal reproducing code (which could be reduced further when removing the
pyts
adapter):The key observation is that the test instance of
ShapeletTransformPyts
, which contains aShapeletTransform
instance, has ashapelets_
attribute of unexpected type (nestednumpy
).I have not debugged this, but I suspect this is due to ambiguous array coercion, e.g., assigning an array to a scalar somewhere, and when an array shape lengths ends up 1, the coercion behaviour breaks.
sktime
issue: https://github.com/sktime/sktime/issues/6171