tslearn-team / tslearn

The machine learning toolkit for time series analysis in Python
https://tslearn.readthedocs.io
BSD 2-Clause "Simplified" License
2.92k stars 342 forks source link

lcss similarity is returns unity for all timeseries #509

Closed NAThompson closed 8 months ago

NAThompson commented 9 months ago

Describe the bug

In the documentation of LCSS, it says that the pseudo-metric obeys the property ∀x LCSS(x, x) = 0.

However, I have found an x such that LCSS(x, x) = 1.0. In addition, I have built a family of randomly perturbed waveforms such that they all have the same score.

To Reproduce

from math import pi as π
import random
import numpy

def test_reproduce():
    f0 = 20e9
    period = 1/f0
    ω0 = 2*π*f0
    waveforms = numpy.empty(shape=(101, 512))
    times = numpy.linspace(-period/2, period/2, waveforms.shape[1])
    for i in range(waveforms.shape[0]):
        φ = random.gauss(0.0, 0.5)
        k = 3 + random.uniform(-0.5, 0.5)
        values = 0.5*(numpy.tanh(k*ω0*times + φ) + 1) + random.uniform(-0.05, 0.05)
        waveforms[i, :] = values

    typical_values = 0.5*(numpy.tanh(3*ω0*times) + 1)
    scores = numpy.empty(waveforms.shape[0])
    for i in range(waveforms.shape[0]):
        scores[i] = lcss(typical_values, waveforms[i, :])

    # Prints an array of 1s:
    print(scores)
    # prints 1:
    print(lcss(waveforms[0, :], waveforms[0, :]))

The output is:

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1.]
1.0

Expected behavior

I expect a randomly perturbed family of waveforms to all have different scores, and the property LCSS(x,x) = 0 to hold.

Environment (please complete the following information):

YannCabanes commented 8 months ago

Hello @NAThompson, Thanks a lot for your issue. Indeed, LCSS is a similarity measure and we have for all x, lcss(x, x) = 1. Therefore there is no bug in the code, but there is an error in the documentation of the function LCSS. I have fixed the errors of the documentation in PR https://github.com/tslearn-team/tslearn/pull/513.