Global alignment kernel returns NaN for all timeseries

tslearn-team / tslearn

The machine learning toolkit for time series analysis in Python

BSD 2-Clause "Simplified" License

2.91k stars 342 forks source link

import random import numpy from math import pi as π from tslearn.metrics import sigma_gak, gak def test_reproduce(): f0 = 20e9 period = 1/f0 ω0 = 2*π*f0 # works at length 405; dies at 410: waveforms = numpy.empty(shape=(5, 410)) times = numpy.linspace(-period/2, period/2, waveforms.shape[1]) for i in range(waveforms.shape[0]): φ = random.gauss(0.0, 0.5) k = 3 + random.uniform(-0.5, 0.5) waveforms[i, :] = 0.5*(numpy.tanh(k*ω0*times + φ) + 1) + random.uniform(-0.05, 0.05) typical_values = 0.5*(numpy.tanh(3*ω0*times) + 1) σ = sigma_gak(typical_values) scores = numpy.empty(waveforms.shape[0]) for i in range(waveforms.shape[0]): scores[i] = gak(typical_values, waveforms[i, :], sigma=σ) print(scores) print(gak(waveforms[0, :], waveforms[0, :], sigma=σ))

Hello @NAThompson, Thank you for your issue. Indeed, there is an overflow which can occur if the function gak is run on long time series which have very few variations. We have: gak(s1, s2) = unnormalized_gak(s1, s2) / sqrt(unnormalized_gak(s1, s1) * unnormalized_gak(s2, s2)) For example, if s1 is a constant time series (worst possible scenario) of size n, the function unnormalized_gak(s1, s1) will return the number of possible warpings to match s1 with itself, each warping being optimal. This number is equal to the number of paths in a grid of n rows and n columns to go from the top left cell to the bottom right cell when the only possible moves are going right, down or diagonally (right and down). When s1 is a constant time series, unnormalized_gak(s1, s1) returns a float when n <= 405 an returns inf when n >= 406. If s1 and s2 are both constant time series of size n, the product unnormalized_gak(s1, s1) * unnormalized_gak(s2, s2) returns a float when n <= 204 and returns inf when n >= 205.

tslearn-team / tslearn

Global alignment kernel returns NaN for all timeseries #510