Open LuigiDarkSimeone opened 2 years ago
Have you tried "weight" argument while creating datasets? You can create a column with weights to be used in training
ds = TimeSeriesDataSet(
data=data[train_data_filter],
time_idx=time_idx_col,
target=...,
weight='weight', # pass name of a weight column in your df, samples/sampler weight(s)
group_ids=group_ids,
...
)
Hi @LuigiDarkSimeone,
1) As suggested by @fnavruzov, on way to "rebalance" the dataset could be to use the weight
argument of TimeSeriesDataSet
. This will generate a weight
tensor in addition to the target
tensor used while fitting the model.
Note that in this case, the portion of the loss associated to each sample is weighted differently. This is similar to what is done in scikit-learn (sample_weight
argument of method .fit(...)
)
2) You could also use the weights to alter the probability of a given sample to be part of a mini-batch (sampling scheme). As indicated in the documentation, you can call method to_dataloader
with a custom sampler, for example an instance of torch WeightedRandomSampler
. You can find a small example here.
3) You can aso combine both 1) and 2).
N.B: The DeepAR paper empirically shows the benefit of method 2) compared to not using any weights. To the best of my knowledge, they do not present any result based on method 1). That being said, in their setting, the issue is the size of the dataset and the main problem in this case is to be able to select the most relevant samples (since the total number of samples is huge, it may not be possible to go over all samples several times during the training procedure and they show that weighting the samples based on their "velocity" greatly improves the performances).
First of all thanks to @RonanFR and @fnavruzov, for your replies. Lately it has been quite hard to get answers in here. I will have a look at your oprions and test them to get whether they are suitable for my case.
Due to the struggling I am having to get answer, and since you look expert, I would like you to kindly have a look at this question I posted quite a few days ago (which it will never get an answer I guess):
https://github.com/jdb78/pytorch-forecasting/issues/1032
I know it is not good practice to post another question in a different issue, so I really apologise in advance, but I cannot get over this problem, even after looking the source code. Hope to hear from you soon
many thanks Luigi
Thanks @RonanFR, @fnavruzov.
I am trying to implement what you've suggested using the "weight
" argument in the TimeseriesDataset
Class in order to manage imbalances in my dataset.
training = TimeSeriesDataSet(
myData,
time_idx="Time_idx",
target="TVPI",
group_ids=["Fund"],
min_encoder_length=8,
max_encoder_length=80,
min_prediction_length=1,
max_prediction_length=30,
weight="Weight"
static_categoricals=...
Where the Weight
column contains the weight associated to each sample.
Unfortunetly the described implementation raises the error below:
Would you know how to solve it? Thanks, Francesco
Hi @FrancescoFondaco ,
Can you provide a detailed minimal reproducible example that raises this error ? (small toy dataset of only few lines)
Thanks @RonanFR, @fnavruzov.
I am trying to implement what you've suggested using the "
weight
" argument in theTimeseriesDataset
Class in order to manage imbalances in my dataset.training = TimeSeriesDataSet( myData, time_idx="Time_idx", target="TVPI", group_ids=["Fund"], min_encoder_length=8, max_encoder_length=80, min_prediction_length=1, max_prediction_length=30, weight="Weight" static_categoricals=...
Where the
Weight
column contains the weight associated to each sample.Unfortunetly the described implementation raises the error below:
Would you know how to solve it? Thanks, Francesco
Have you figured out this issue? I am having the same issue after adding the "weight" parameter. Thx!
Dear @FrancescoFondaco and @QijiaShao, I suspect the issue is related to the automatic fill-forward nan mechanism. If your time index is not continuous then the missing steps are filled but the weights are missing for those samples. So you should disable automatic filling in case you are using weights. This is just a guess.
Best wished, Daniel
I have a dataset of several shops. For each I have a time series of sales. Shops are spread unequally in the world (1000 in us, 100 in EU), I need to predict the sales based on the location and other variables. However, such data set is imbalanced. Is there a way to manage imbalance in TFT? (upsampling, downsampling, apply a weight-balance similar to sklearn, or force each batch to select equal number of example)