tslearn-team / tslearn

The machine learning toolkit for time series analysis in Python
https://tslearn.readthedocs.io
BSD 2-Clause "Simplified" License
2.92k stars 342 forks source link

Index of changes in PAA and SAX #441

Open yasirroni opened 1 year ago

yasirroni commented 1 year ago

Is your feature request related to a problem? Please describe. I'm running the tutorial of PAA and SAX in the doc

import numpy
import matplotlib.pyplot as plt

from tslearn.generators import random_walks
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
from tslearn.piecewise import PiecewiseAggregateApproximation
from tslearn.piecewise import SymbolicAggregateApproximation, \
    OneD_SymbolicAggregateApproximation

numpy.random.seed(0)
# Generate a random walk time series
n_ts, sz, d = 1, 100, 1
dataset = random_walks(n_ts=n_ts, sz=sz, d=d)
scaler = TimeSeriesScalerMeanVariance(mu=0., std=1.)  # Rescale time series
dataset = scaler.fit_transform(dataset)

n_paa_segments = 10
paa = PiecewiseAggregateApproximation(n_segments=n_paa_segments)
paa_data = paa.fit_transform(dataset)
paa_dataset_inv = paa.inverse_transform(paa_data)

After reading the docs, it seems that there is no function to find the index when the paa_data changes occured.

Describe the solution you'd like paa.get_index(paa_data) should return the index where the paa_data changes occured. That will be nice.

yasirroni commented 1 year ago

My implementation:

from numba import njit

@njit
def find_first(array, item):
    for idx, val in enumerate(array):
        if val == item:
            return idx
    return None

def get_paa_index(paa_dataset_inv, paa_data):
    paa_dataset_inv = paa_dataset_inv.ravel()
    paa_data = paa_data.ravel()

    idxs = []
    idx = 0
    for val in paa_data:
        idx_ = find_first(paa_dataset_inv[idx:], val)
        idx += idx_
        idxs.append(idx)

    return idxs

get_paa_index(paa_dataset_inv, paa_data)