Open fkiraly opened 1 year ago
Hello @fkiraly, I have discussed this issue with @rtavenar: it is not a bug and it corresponds to the documentation (which we will make clearer).
The functions taking 3D arrays as inputs are the functions: cdist_*
For example in the docstring of the function cdist_dtw
, there is the input parameter:
dataset1 : array-like
A dataset of time series
Functions of the form cdist_*
can also accept 1D or 2D arrays when they use:
dataset1 = to_time_series_dataset(dataset1)
The other metric functions usually take 2D arrays as inputs.
For example, in the docstring of the function lcss
, there is the input parameter:
s1
A time series.
By default time series should be 2D arrays, some metric functions also accept 1D arrays when they use:
s1 = to_time_series(s1)
To make it cleared, I will specify in the docstrings of the functions the possible shapes of the input arrays in the PR https://github.com/tslearn-team/tslearn/pull/486.
For example:
s1 : array-like, shape=(sz1, d) or (sz1,)
A time series. If shape is (sz1,), the time series is assumed to be univariate.
Or:
dataset1 : array-like, shape=(n_ts, sz, d) or (n_ts, sz) or (sz,)
A dataset of time series. If shape is (n_ts, sz), the dataset is composed of univariate time series.
If shape is (sz,), the dataset is a composed of a unique univariate time series.
I see, thanks for clarifying. The lack of specificity in the docstrings did cause some headache and trial-and-error on our side...
I would suggest in the docstrings to state clearly, in all 3 cases (1D, 2D, 3D), which index corresponds to instance index, time index, variable index.
Feel free to close this issue then.
PS: if I were to design the interface, I would add a default upcast to 3D to every distance function, and have only one single interface point. For deprecation period, you can add 3D to the 2D functions and a warning. Why: simpler for the user to learn just one generic interface, simpler to maintain and test (just loop over the entire list of distances)
It seems that
metrics.lcss
is non-conformant with generic interface expectations towards time series distances when passing 3Dnp.ndarrays
.It would seem that
lcss
fails on some, or all 3Dnp.ndarrays
. This behaviour was observed when running ansktime
conformant adapter through thesktime
test suite, here: https://github.com/sktime/sktime/issues/5367#issuecomment-1750378133Update: the issue seems to be that, unlike all other metrics,
lcss
does not accept 3D numpy as input.A secondary issue is the lack of easily findable documentation on the input format of the "time series" in the various metrics. Unfortunately, metrics with the same docstring can have different input conventions (supports 3D or not).