Thanks for the awesome library!! :-) I have 2 questions about TimeSeriesDataSet:
1. Apply StandardScaler on target
When I created TimeSeriesDataSet, I would like to apply sklearn StandardScaler to my training X and labels y.
I understood that the parameter of 'scalers' can only be used for other variables than targets.
Can I consider when applying GroupNormalizer (scaled_by_group=False) as target_normalizer is equivalent to the StandardScaler? using the X mean and STD only , and then apply to X and Y.
2.I had a question on categorical_encoders.
Why must I use NaNLabelEncoder to categorical_encoders? Otherwise, I will have the following errors?
my group_ids is "Shop"(numerical data, like 0,1,2,3,4...) even though my group_id has no NaN value.
training = TimeSeriesDataSet(
....
group_ids=["Shop"],
categorical_encoders={"ID":None,'__group_id__Shop': None},
....
AttributeError Traceback (most recent call last)
in
3 batch_size=128
4
----> 5 training = TimeSeriesDataSet(
6 trainingset,
7 time_idx="time_idx",
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pytorch_forecasting/data/timeseries.py in __init__(self, data, time_idx, target, group_ids, weight, max_encoder_length, min_encoder_length, min_prediction_idx, min_prediction_length, max_prediction_length, static_categoricals, static_reals, time_varying_known_categoricals, time_varying_known_reals, time_varying_unknown_categoricals, time_varying_unknown_reals, variable_groups, dropout_categoricals, constant_fill_strategy, allow_missings, lags, add_relative_time_idx, add_target_scales, add_encoder_length, target_normalizer, categorical_encoders, scalers, randomize_length, predict_mode)
428
429 # preprocess data
--> 430 data = self._preprocess_data(data)
431 for target in self.target_names:
432 assert target not in self.scalers, "Target normalizer is separate and not in scalers."
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pytorch_forecasting/data/timeseries.py in _preprocess_data(self, data)
635 # use existing encoder - but a copy of it not too loose current encodings
636 encoder = deepcopy(self.categorical_encoders.get(group_name, NaNLabelEncoder()))
--> 637 self.categorical_encoders[group_name] = encoder.fit(data[name].to_numpy().reshape(-1), overwrite=False)
638 data[group_name] = self.transform_values(name, data[name], inverse=False, group_id=True)
639
AttributeError: 'NoneType' object has no attribute 'fit'
Use the TorchNormalizer if you want to scale all data the same way. See docs for the GroupNormalizer for its details.
Yes. Use the NaNLabelEncoder for the categorical encoding. Do not overwrite the __group_id__ encoders as this is used internally to ensure groups can be decoded correctly even if new ones are added.
Thanks for the awesome library!! :-) I have 2 questions about TimeSeriesDataSet:
1. Apply StandardScaler on target When I created TimeSeriesDataSet, I would like to apply sklearn StandardScaler to my training X and labels y. I understood that the parameter of 'scalers' can only be used for other variables than targets.
Can I consider when applying GroupNormalizer (scaled_by_group=False) as target_normalizer is equivalent to the StandardScaler? using the X mean and STD only , and then apply to X and Y.
2.I had a question on categorical_encoders.
Why must I use NaNLabelEncoder to categorical_encoders? Otherwise, I will have the following errors? my group_ids is "Shop"(numerical data, like 0,1,2,3,4...) even though my group_id has no NaN value.
training = TimeSeriesDataSet( .... group_ids=["Shop"], categorical_encoders={"ID":None,'__group_id__Shop': None}, ....
AttributeError Traceback (most recent call last)