Closed JakeForsey closed 3 years ago
There should be an easy way to find out. Can you sample one batch from the time series data set. Fix a seed and pass it directly through the forward function of the network. Now modify the target values of decoder_cont
of the batch, set the same seed, and pass it again. Is the result different. If yes, we have a problem.
Thanks for the debugging idea!
import numpy as np
import pandas as pd
import torch
from pytorch_forecasting import TimeSeriesDataSet, DeepAR
if __name__ == "__main__":
df = pd.DataFrame(dict(
time_idx=range(10), target=np.array([i for i in range(10)]) / 9.0
)).assign(id=1)
df.target = df.target
ds = TimeSeriesDataSet(
df,
time_idx="time_idx",
target="target
tensor([[[0.6245],
[0.7737]]])
Another forward pass to check the seeding works
tensor([[[0.6245],
[0.7737]]])
Another forward pass to check that updating decoder_target does not change output",
group_ids=["id"],
min_encoder_length=8,
max_encoder_length=8,
min_prediction_length=2,
max_prediction_length=2,
time_varying_unknown_reals=["target"],
predict_mode=True,
randomize_length=False,
)
dl = ds.to_dataloader(train=False)
model = DeepAR.from_dataset(ds, cell_type="GRU")
model.eval()
print("First forward pass")
torch.manual_seed(1)
x, y = next(iter(dl))
y_pred = model.forward(x, n_samples=1)
print(y_pred["prediction"])
print("Another forward pass to check the seeding works")
torch.manual_seed(1)
x, y = next(iter(dl))
y_pred = model.forward(x, n_samples=1)
print(y_pred["prediction"])
print("Another forward pass to check that updating decoder_target does not change output")
torch.manual_seed(1)
x, y = next(iter(dl))
x['decoder_target'] = torch.tensor([0., 0.])
First forward pass
tensor([[[0.6245],
[0.7737]]])
Another forward pass to check the seeding works
tensor([[[0.6245],
[0.7737]]])
Another forward pass to check that updating decoder_target does not change output
tensor([[[0.6245],
[0.7737]]])
I've managed to recreate the case that is causing me concern with fake data and some of my extra code removed.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
from pytorch_forecasting import TimeSeriesDataSet, DeepAR, GroupNormalizer
import pytorch_lightning as pl
from pytorch_lightning.callbacks import EarlyStopping
if __name__ == "__main__":
predict_steps = 4
target = np.array([0.25] * 20 + [0.75] * 25 + [0.3] * 41)
target[-2:] = 0.01
print(target.shape)
df = pd.DataFrame(dict(
time_idx=range(86), target=target
)).assign(id=1)
max_time_idx = df.time_idx.max()
train_cutoff = int(max_time_idx - (predict_steps * 2))
val_cutoff = int(max_time_idx - (predict_steps * 1))
train_df = df[df.time_idx <= train_cutoff].copy()
val_df = df[df.time_idx <= val_cutoff].copy()
test_df = df.copy()
min_encoder_length = int((train_cutoff - predict_steps) * 0.25)
max_encoder_length = train_cutoff - predict_steps
print(f"encoder length: {min_encoder_length} -> {max_encoder_length}")
train = TimeSeriesDataSet(
train_df,
time_idx="time_idx",
target="target",
group_ids=["id"],
min_encoder_length=min_encoder_length,
max_encoder_length=max_encoder_length,
min_prediction_length=predict_steps,
max_prediction_length=predict_steps,
time_varying_unknown_reals=["target"],
target_normalizer=GroupNormalizer(groups=["id"], center=True, coerce_positive=False),
randomize_length=True,
)
val = TimeSeriesDataSet.from_dataset(
train, val_df,
predict=True,
stop_randomization=True,
)
test = TimeSeriesDataSet.from_dataset(
train, test_df,
predict=True,
stop_randomization=True
)
batch_size = 64
train_dataloader = train.to_dataloader(
train=True, batch_size=batch_size, num_workers=8
)
val_dataloader = val.to_dataloader(
train=False, batch_size=batch_size, num_workers=8
)
model = DeepAR.from_dataset(
train,
dropout=0.0,
cell_type="GRU",
hidden_size=30,
rnn_layers=2,
learning_rate=5e-3,
log_val_interval=1,
reduce_on_plateau_patience=10,
reduce_on_plateau_min_lr=1e-5,
weight_decay=0.001,
optimizer="adamw"
)
early_stop_callback = EarlyStopping(
monitor="val_loss",
min_delta=1e-3,
patience=30,
verbose=True,
mode="min"
)
trainer = pl.Trainer(
max_epochs=1500,
gpus=1,
callbacks=[early_stop_callback],
)
trainer.fit(
model,
train_dataloader=train_dataloader,
val_dataloaders=val_dataloader
)
model = DeepAR.load_from_checkpoint(
trainer.checkpoint_callback.best_model_path
)
y = torch.cat([y.unsqueeze(0) for x, y in iter(test)])
raw_preds, x = model.predict(test, mode="raw", return_x=True)
preds = model.predict(test)
model.plot_prediction(x, raw_preds, idx=0)
plt.savefig(f"test.png")
I've gone over the datasets config and contents with debugger and can't see anything I've botched. Can't sleep easy until I get to the bottom of this!
Edit: I appreciate the time-series data is not really suitable for forecasting, but its one outlying time-series of many.
Think you want to do x["decoder_cont"] = torch.zeros_like(x["decoder_cont"])
in the last step but the result is the same for me. There is one bug, however, which is unlikely to cause issues: the network seems to modify x["decoder_cont"]. We might need to make an explicit copy.
it does look a bit like leakage - agreed. But it must be some sophisticated form because there would be 100% convergence otherwise.
Using the GroupNormalizer instead of the EncoderNormalizer could cause maybe a slight issue (should normally not really be big)
Using the GroupNormalizer instead of the EncoderNormalizer could cause maybe a slight issue (should normally not really be big)
I figured the norm values for the target_normalizer would only be fit on the initial dataset? I checked this and indeed the norms are the same for the test and train target_normalizers:
debugger in:
test.target_normalizer.norm_ == train.target_normalizer.norm_
debugger out:
center scale [id ] [0 True True]
This dataset seems to break with EncoderNormalizer, loss goes through the roof
Makes sense if the normalization by the EncoderNormalizer is not stable. Could you check the distribution of predictions for the example above? I wonder if this is just a random one. Does it always happen?
Makes sense if the normalization by the EncoderNormalizer is not stable. Could you check the distribution of predictions for the example above? I wonder if this is just a random one. Does it always happen?
That image was from the first example I ran. I will run a bunch more and attach in the morning. G'night, thanks for your time.
I've experimented with changing where the step change occurs in the held out data and the dip follows it precisely...
target = np.array([0.25] * 20 + [0.75] * 25 + [0.3] * 41)
target[-3:] = 0.01
target = np.array([0.25] * 20 + [0.75] * 25 + [0.3] * 41)
target[-1:] = 0.01
Hm. Can you trace it the change through the code to find out where the leakage happens?
Its pretty hardcore! The scaled target is being fed straight in (see screenshot) :D
Seems like the test that modified the decoder_cont
is not adequate somehow (even with decoder_target
switched to decoder_cont
).
I think 1 possible bug is that the below block does not over ride the target variable, I think maybe because target_pos is incorrect?
x = input_vector[:, [idx]]
x[:, 0, target_pos] = input_target
Another thing that looks possible to me is that I think maybe there is an off by one error, the prediction starts first value of the held out data.
AKA, I can't see anything here that prevents the first value going into the model (even if the target_pos was correct) as the input_target is not yet updated.
input_target = input_vector[:, 0, target_pos]
output = []
for idx in range(input_vector.size(1)):
x = input_vector[:, [idx]]
x[:, 0, target_pos] = input_target
1st part is resolved by the linked MR, second part is handled by construct_input_vector(... one_off_target)
I think?
Agree with DeepAR may have the information leakage. I increase the decoder length but the loss of DeepAR even decreases, while all other models like TFT, Nbeats, LSTM all increase their loss.
My model (DeepAR) is performing way better than I think it should.
I've done some hunting and it appears that even in test mode
input_vector
at this point contains scaled target values.I tried following things further to here, and note that there is specific handling in place for when n_samples is not None (which is the case when using DeepAR.predict()), but I can't see where the scaled values masked / excluded.
Am I just lost / paranoid or is there leakage somewhere?