tslearn-team / tslearn

The machine learning toolkit for time series analysis in Python
https://tslearn.readthedocs.io
BSD 2-Clause "Simplified" License
2.92k stars 342 forks source link

Why doesn't the tslearn definition of lcss mention the delta parameter? #526

Open camilo-unipd opened 1 month ago

camilo-unipd commented 1 month ago

Although the delta parameter from the original LCSS paper (Discovering Similar Multidimensional Trajectories) is mentioned in the user guide page and in the function documentation's body, it is not being considered as an actual function parameter. I have been looking in al the tslearn resources I found, but I could not find an explicit explanation for this.

Also, I was checking the source code (a snippet is shown below) and the delta parameter does not appear either (in contrast to the algorithm shown in the user guide). What am I missing in this case? Is this controlled by the other parameters?

def lcss_accumulated_matrix(s1, s2, eps, mask, be=None):
    """Compute the longest common subsequence similarity score between
    two time series.

    Parameters
    ----------
    s1 : array-like, shape=(sz1, d) or (sz1,)
        First time series. If shape is (sz1,), the time series is assumed to be univariate.
    s2 : array-like, shape=(sz2, d) or (sz2,)
        Second time series. If shape is (sz2,), the time series is assumed to be univariate.
    eps : float
        Matching threshold.
    mask : array-like, shape=(sz1, sz2)
        Mask. Unconsidered cells must have infinite values.
    be : Backend object or string or None
        Backend. If `be` is an instance of the class `NumPyBackend` or the string `"numpy"`,
        the NumPy backend is used.
        If `be` is an instance of the class `PyTorchBackend` or the string `"pytorch"`,
        the PyTorch backend is used.
        If `be` is `None`, the backend is determined by the input arrays.
        See our :ref:`dedicated user-guide page <backend>` for more information.

    Returns
    -------
    acc_cost_mat : array-like, shape=(sz1 + 1, sz2 + 1)
        Accumulated cost matrix.
    """
    be = instantiate_backend(be, s1, s2)
    s1 = be.array(s1)
    s2 = be.array(s2)
    s1 = to_time_series(s1, remove_nans=True, be=be)
    s2 = to_time_series(s2, remove_nans=True, be=be)
    l1 = be.shape(s1)[0]
    l2 = be.shape(s2)[0]
    acc_cost_mat = be.full((l1 + 1, l2 + 1), 0)

    for i in range(1, l1 + 1):
        for j in range(1, l2 + 1):
            if be.isfinite(mask[i - 1, j - 1]):
                if be.is_numpy:
                    squared_dist = _njit_local_squared_dist(s1[i - 1], s2[j - 1])
                else:
                    squared_dist = _local_squared_dist(s1[i - 1], s2[j - 1], be=be)
                if be.sqrt(squared_dist) <= eps:
                    acc_cost_mat[i][j] = 1 + acc_cost_mat[i - 1][j - 1]
                else:
                    acc_cost_mat[i][j] = max(
                        acc_cost_mat[i][j - 1], acc_cost_mat[i - 1][j]
                    )

    return acc_cost_mat

It would be nice to see in the documentation how this thing is handled.