predict-idlab / tsflex

Flexible time series feature extraction & processing
https://predict-idlab.github.io/tsflex/
MIT License
400 stars 26 forks source link

Feature extraction issue when series name is a number #97

Open windischbauer opened 1 year ago

windischbauer commented 1 year ago

I have a data set where each series has a number as a key and I would like to extract the data. I have adapted the code from the tutorial to show the issue and I am using v0.3.0.:

import pandas as pd; import scipy.stats as ss; import numpy as np
from tsflex.features import FeatureDescriptor, FeatureCollection, FuncWrapper

# 1. -------- Get your time-indexed data --------
# Data contains 1 column; ["TMP"]
url = "https://github.com/predict-idlab/tsflex/raw/main/examples/data/empatica/"
data = pd.read_parquet(url + "tmp.parquet").set_index("timestamp")

# I renamed the column name to showcase the issue:
data.rename(columns={'TMP': 1234}, inplace=True)

# 2 -------- Construct your feature collection --------
fc = FeatureCollection(
    feature_descriptors=[
        FeatureDescriptor(
            function=FuncWrapper(func=ss.skew, output_names="skew"),
            series_name="1234",
            window="5min", stride="2.5min",
        )
    ]
)
# -- 2.1. Add features to your feature collection
# NOTE: tsflex allows features to have different windows and strides
# fc.add(FeatureDescriptor(np.min, "TMP", '0.5min', '2.5min'))

# 3 -------- Calculate features --------
fc.calculate(data=data, return_df=True)  # which outputs:

IndexError: list index out of range when the series name is in quotation marks and TypeError: argument of type 'int' is not iterable when the series name is a number