Apply StandardScaler on target & categorical_encoders

ss20212 commented 3 years ago

Thanks for the awesome library!! :-) I have 2 questions about TimeSeriesDataSet:

1. Apply StandardScaler on target When I created TimeSeriesDataSet, I would like to apply sklearn StandardScaler to my training X and labels y. I understood that the parameter of 'scalers' can only be used for other variables than targets.

Can I consider when applying GroupNormalizer (scaled_by_group=False) as target_normalizer is equivalent to the StandardScaler? using the X mean and STD only , and then apply to X and Y.

2.I had a question on categorical_encoders.
Why must I use NaNLabelEncoder to categorical_encoders? Otherwise, I will have the following errors? my group_ids is "Shop"(numerical data, like 0,1,2,3,4...) even though my group_id has no NaN value.

training = TimeSeriesDataSet( .... group_ids=["Shop"], categorical_encoders={"ID":None,'__group_id__Shop': None}, ....

AttributeError Traceback (most recent call last)

in 3 batch_size=128 4 ----> 5 training = TimeSeriesDataSet( 6 trainingset, 7 time_idx="time_idx", /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pytorch_forecasting/data/timeseries.py in __init__(self, data, time_idx, target, group_ids, weight, max_encoder_length, min_encoder_length, min_prediction_idx, min_prediction_length, max_prediction_length, static_categoricals, static_reals, time_varying_known_categoricals, time_varying_known_reals, time_varying_unknown_categoricals, time_varying_unknown_reals, variable_groups, dropout_categoricals, constant_fill_strategy, allow_missings, lags, add_relative_time_idx, add_target_scales, add_encoder_length, target_normalizer, categorical_encoders, scalers, randomize_length, predict_mode) 428 429 # preprocess data --> 430 data = self._preprocess_data(data) 431 for target in self.target_names: 432 assert target not in self.scalers, "Target normalizer is separate and not in scalers." /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pytorch_forecasting/data/timeseries.py in _preprocess_data(self, data) 635 # use existing encoder - but a copy of it not too loose current encodings 636 encoder = deepcopy(self.categorical_encoders.get(group_name, NaNLabelEncoder())) --> 637 self.categorical_encoders[group_name] = encoder.fit(data[name].to_numpy().reshape(-1), overwrite=False) 638 data[group_name] = self.transform_values(name, data[name], inverse=False, group_id=True) 639 AttributeError: 'NoneType' object has no attribute 'fit'

jdb78 commented 3 years ago

Use the TorchNormalizer if you want to scale all data the same way. See docs for the GroupNormalizer for its details.
Yes. Use the NaNLabelEncoder for the categorical encoding. Do not overwrite the __group_id__ encoders as this is used internally to ensure groups can be decoded correctly even if new ones are added.

ss20212 commented 3 years ago

Thank you for the answer!

sktime / pytorch-forecasting

Apply StandardScaler on target & categorical_encoders #477

training = TimeSeriesDataSet( .... group_ids=["Shop"], categorical_encoders={"ID":None,'__group_id__Shop': None}, ....