sktime / pytorch-forecasting

Time series forecasting with PyTorch
https://pytorch-forecasting.readthedocs.io/
MIT License
4.02k stars 639 forks source link

'nth' is not a valid function name for transform(name) #1310

Open haopengcu opened 1 year ago

haopengcu commented 1 year ago

Expected behavior

I tried to repeat the simple N-Beats notebook in the tutorial

https://github.com/jdb78/pytorch-forecasting/blob/master/docs/source/tutorials/ar.ipynb

Actual behavior

However, error showed up in the function TimeSeriesDataSet(). It seems to me this error was due to pandas, which version was used to develop pytorch-forecasting ?

ValueError Traceback (most recent call last) Cell In[5], line 10 7 context_length = max_encoder_length 8 prediction_length = max_prediction_length ---> 10 training = TimeSeriesDataSet( 11 data[lambda x: x.time_idx <= training_cutoff], 12 time_idx="time_idx", 13 target="value", 14 categorical_encoders={"series": NaNLabelEncoder().fit(data.series)}, 15 group_ids=["series"], 16 # only unknown variable is "value" - and N-Beats can also not take any additional variables 17 time_varying_unknown_reals=["value"], 18 max_encoder_length=context_length, 19 max_prediction_length=prediction_length, 20 ) 22 validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training_cutoff + 1) 23 batch_size = 128

File ~\anaconda3\envs\py310tft\lib\site-packages\pytorch_forecasting\data\timeseries.py:481, in TimeSeriesDataSet.init(self, data, time_idx, target, group_ids, weight, max_encoder_length, min_encoder_length, min_prediction_idx, min_prediction_length, max_prediction_length, static_categoricals, static_reals, time_varying_known_categoricals, time_varying_known_reals, time_varying_unknown_categoricals, time_varying_unknown_reals, variable_groups, constant_fill_strategy, allow_missing_timesteps, lags, add_relative_time_idx, add_target_scales, add_encoder_length, target_normalizer, categorical_encoders, scalers, randomize_length, predict_mode) 478 assert target not in self.scalers, "Target normalizer is separate and not in scalers." 480 # create index --> 481 self.index = self._construct_index(data, predict_mode=self.predict_mode) 483 # convert to torch tensor for high performance data loading later 484 self.data = self._data_to_tensors(data)

File ~\anaconda3\envs\py310tft\lib\site-packages\pytorch_forecasting\data\timeseries.py:1218, in TimeSeriesDataSet._construct_index(self, data, predict_mode) 1205 """ 1206 Create index of samples. 1207 (...) 1214 It contains a list of all possible subsequences. 1215 """ 1216 g = data.groupby(self._group_ids, observed=True) -> 1218 df_index_first = g["time_idx"].transform("nth", 0).to_frame("time_first") 1219 df_index_last = g["time_idx"].transform("nth", -1).to_frame("time_last") 1220 df_index_diff_to_next = -g["__time_idx__"].diff(-1).fillna(-1).astype(int).to_frame("time_diff_to_next")

File ~\anaconda3\envs\py310tft\lib\site-packages\pandas\core\groupby\generic.py:469, in SeriesGroupBy.transform(self, func, engine, engine_kwargs, *args, kwargs) 466 @Substitution(klass="Series", example=__examples_series_doc) 467 @Appender(_transform_template) 468 def transform(self, func, *args, engine=None, engine_kwargs=None, *kwargs): --> 469 return self._transform( 470 func, args, engine=engine, engine_kwargs=engine_kwargs, kwargs 471 )

File ~\anaconda3\envs\py310tft\lib\site-packages\pandas\core\groupby\groupby.py:1534, in GroupBy._transform(self, func, engine, engine_kwargs, *args, *kwargs) 1532 elif func not in base.transform_kernel_allowlist: 1533 msg = f"'{func}' is not a valid function name for transform(name)" -> 1534 raise ValueError(msg) 1535 elif func in base.cythonized_kernels or func in base.transformation_kernels: 1536 # cythonized transform or canned "agg+broadcast" 1537 return getattr(self, func)(args, **kwargs)

ValueError: 'nth' is not a valid function name for transform(name)

Paste the command(s) you ran and the output. Including a link to a colab notebook will speed up issue resolution. If there was a crash, please include the traceback here. The code used to initialize the TimeSeriesDataSet and model should be also included.

sairamtvv commented 1 year ago

@haopengcu haopengcu Even I got the same error, it got resolved when I changed the version of pandas from 2.0 to 1.5. Surely, the problem lies in your installation. Please recheck it