[BUG] Weird behaviour if casting TimeSeries to float32

nejox commented 8 months ago

Describe the bug I'm working with historical sales data of multiple stores and products. I convert a DataFrame indexed by id_store, id_product, and target_date into multiple TimeSeries objects, marking IDs as static covariates. Initially integers, these IDs automatically get parsed to float64. Due to performance problems with the TemporalFusionTransformer with my amount of time series I was attempting mixed-precision training via TimeSeries.astype("float32"). This modifies static covariates to float32, altering ID values in a weird way: e.g., id_product 100100037 changes to 100100040.

Questions:

Can static covariates be kept as integers in TimeSeries objects?
Is treating these IDs as strings a better approach for TFT compatibility and performance?

To Reproduce

from darts.utils.timeseries_generation import linear_timeseries

import pandas as pd

sc1 = pd.DataFrame([100100037], columns=["id_product"])

ts = linear_timeseries(start_value=0, end_value=10, length=10, freq="D")
new_ts = ts.with_static_covariates(sc1)
new_ts = new_ts.astype("float32")
print(new_ts.static_covariates)

output:

static_covariates   id_product
component                     
linear             100100040.0

Expected behavior Static Covariates keep the original dtype like integer instead of getting parsed to floats.

System:

Python version: 3.9 darts version: 0.27.2 lightning version: 2.1.3 torch version: 2.1.0 OS: macOS 14.2.1 (23C71)

hrzn commented 8 months ago

I think this may not be a bug: not casting the static covariates would lead to issues using the PyTorch models.

BohdanBilonoh commented 8 months ago

I faced with the same issue. Mapping to a smaller range works for me

def map_large_ids(ids):
    unique_ids = np.unique(ids)
    id_dict = {id: i for i, id in enumerate(unique_ids)}
    return id_dict

unit8co / darts

[BUG] Weird behaviour if casting TimeSeries to float32 #2277