zalandoresearch / pytorch-ts

PyTorch based Probabilistic Time Series forecasting framework based on GluonTS backend
MIT License
1.21k stars 190 forks source link

TypeError: unhashable type: 'pandas._libs.tslibs.offsets.Day' #23

Closed AlexMRuch closed 3 years ago

AlexMRuch commented 3 years ago

Despite assuring I am pandas 1.0.5 (cf. https://github.com/awslabs/gluon-ts/issues/958), I am still getting a TypeError: unhashable type: 'pandas._libs.tslibs.offsets.Day' error when running the following:

# Define DL Time Series Model
estimator = DeepAREstimator(
    freq = FREQ,
    prediction_length = 1, #predict 1 day ahead
    input_size = 32,
    trainer = Trainer(
        epochs = 100,
        device = DEVICE
    )
predictor = estimator.train(training_data=training_data)

Which returned

0it [00:00, ?it/s]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-76-b7c68ebabaa3> in <module>
----> 1 predictor = estimator.train(training_data=training_data)

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/model/estimator.py in train(self, training_data)
    146 
    147     def train(self, training_data: Dataset) -> Predictor:
--> 148         return self.train_model(training_data).predictor

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/model/estimator.py in train_model(self, training_data)
    131         trained_net = self.create_training_network(self.trainer.device)
    132 
--> 133         self.trainer(
    134             net=trained_net,
    135             input_names=get_module_forward_input_names(trained_net),

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/trainer.py in __call__(self, net, input_names, data_loader)
     46 
     47             with tqdm(data_loader) as it:
---> 48                 for batch_no, data_entry in enumerate(it, start=1):
     49                     optimizer.zero_grad()
     50                     inputs = [data_entry[k].to(self.device) for k in input_names]

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/tqdm/std.py in __iter__(self)
   1128 
   1129         try:
-> 1130             for obj in iterable:
   1131                 yield obj
   1132                 # Update and possibly print the progressbar.

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __next__(self)
    361 
    362     def __next__(self):
--> 363         data = self._next_data()
    364         self._num_yielded += 1
    365         if self._dataset_kind == _DatasetKind.Iterable and \

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _next_data(self)
    987             else:
    988                 del self._task_info[idx]
--> 989                 return self._process_data(data)
    990 
    991     def _try_put_index(self):

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _process_data(self, data)
   1012         self._try_put_index()
   1013         if isinstance(data, ExceptionWrapper):
-> 1014             data.reraise()
   1015         return data
   1016 

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/_utils.py in reraise(self)
    393             # (https://bugs.python.org/issue2651), so we work around it.
    394             msg = KeyErrorMessage(msg)
--> 395         raise self.exc_type(msg)

TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 28, in fetch
    data.append(next(self.dataset_iter))
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/dataset/transformed_iterable_dataset.py", line 39, in __iter__
    data_entry = next(self._cur_iter)
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 128, in __call__
    for data_entry in data_it:
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 81, in __call__
    for data_entry in data_it:
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 81, in __call__
    for data_entry in data_it:
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 85, in __call__
    raise e
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 83, in __call__
    yield self.map_transform(data_entry.copy(), is_train)
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/feature.py", line 195, in map_transform
    self._update_cache(start, length)
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/feature.py", line 169, in _update_cache
    end = shift_timestamp(start, length)
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/split.py", line 33, in shift_timestamp
    return _shift_timestamp_helper(ts, ts.freq, offset)
TypeError: unhashable type: 'pandas._libs.tslibs.offsets.Day'

Where the following preceded that code:

# Print Timestamp Statistics
earliest_time = min(example_ny_df.index)
latest_time = max(example_ny_df.index)
time_range_full = (max(example_ny_df.index) - min(example_ny_df.index)).days

# Determine Cut-point for 80/20 Training/Testing Splits
TRAININGSPLIT = 0.8
time_range_split = int(time_range_full * TRAININGSPLIT)
time_split = min(example_ny_df.index) + datetime.timedelta(days=time_range_split)

# Create Training Split / Predictor Object
FREQ = "1D"
training_data = ListDataset(
    [{"start": earliest_time, "target": example_ny_df.positiveIncrease[:time_split]}],
    freq = FREQ
)

# Setup GPU, if Exists
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Processing device:", DEVICE)

I'm going to try to redo this from a 100% clean install without even trying the GPU version of torch as mentioned #22

AlexMRuch commented 3 years ago

Gosh, this was another install issue. I'm really sorry. 😬

NielsRogge commented 3 years ago

How did you solve this? Also using Pandas 1.0.5 and having the same issue

kashif commented 3 years ago

@NielsRogge I believe the issue was that prediction_length has to be > 1...

NielsRogge commented 3 years ago

@kashif I'm using a prediction_length of 31. Notebook to reproduce: https://www.kaggle.com/nielsrogge/middle-out-approach-with-pytorch-ts-zalando

kashif commented 3 years ago

@NielsRogge nice! having a look

kashif commented 3 years ago

@NielsRogge I get a 404 on the notebook...

NielsRogge commented 3 years ago

Sorry, should be fixed now.