Failing to reproduce the results from the original paper for electricity dataset

sayanb-7c6 commented 3 years ago

PyTorch-Forecasting version: 0.9.0
PyTorch version: 1.9.1
Python version: 3.9.7
Operating System: CentOS 7

Expected behavior

We were trying to reproduce the results from the electricity dataset that the authors of TFT paper made use of. The dataset can be downloaded by following the instructions from here. The specific file we used is called hourly_electricity.csv, it has about 1 million rows.

Actual behavior

We tried 8-10 iterations of the model with different params/hyperparams but were not able to get close. We kept the all the params as close to the ones used in the params, but no luck so far. Some of the optimal params used by the authors can be found here. We also tried overfitting the model on 10% of the training data following the examples from here, but the program crashed because of memory shortage on the gpu. We'll try again with a smaller percentage, but even before crashing the validation loss was in par with our non-overfitting models, so we're not so sure that'll help us.

## optimal params used by the authors
model_params = {
        'dropout_rate': 0.1,
        'hidden_layer_size': 160,
        'learning_rate': 0.001,
        'minibatch_size': 64,
        'max_gradient_norm': 0.01,
        'num_heads': 4,
        'stack_size': 1
    }

Here's the relevant code to reproduce the results

max_prediction_length = 24
max_encoder_length = 24 * 7

min_encoder_length = max_encoder_length // 2
training_cutoff = df["t"].max() - max_prediction_length

training = TimeSeriesDataSet(
    df[lambda x: x.t <= training_cutoff],
    time_idx="t",
    target="Target",
    group_ids=["categorical_id"],
    min_encoder_length=min_encoder_length,
    max_encoder_length=max_encoder_length,
    min_prediction_length=1,
    max_prediction_length=max_prediction_length,
    static_categoricals=["categorical_id"],
    static_reals=[],
    time_varying_known_categoricals=[],
    variable_groups={},  # group of categorical variables can be treated as one variable
    time_varying_known_reals=['day_of_week', 'hour', 't'],
    time_varying_unknown_categoricals=[],
    time_varying_unknown_reals=[
        "Target",
    ],
    target_normalizer=GroupNormalizer(
        groups=["categorical_id"], transformation="softplus"
    ), 
    add_relative_time_idx=True,
    add_target_scales=True,
    add_encoder_length=True,
)

validation = TimeSeriesDataSet.from_dataset(training, df, predict=True, stop_randomization=True)

batch_size = 64
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=1)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size , num_workers=1)

pl.seed_everything(42)

early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=10, verbose=False, mode="min")
lr_logger = LearningRateMonitor()  # log the learning rate
logger = TensorBoardLogger("lightning_logs")  # logging results to a tensorboard

trainer = pl.Trainer(
    max_epochs=300,
    gpus=[0],
    weights_summary="top",
    gradient_clip_val=0.01,
    limit_train_batches=90,
    # fast_dev_run=True,
    callbacks=[lr_logger, early_stop_callback],
    logger=logger
)

tft = TemporalFusionTransformer.from_dataset(
    training,
    lstm_layers=1,
    learning_rate=0.001,
    hidden_size=160,
    attention_head_size=4,
    dropout=0.1,
    hidden_continuous_size=8,
    output_size=7,  # 7 quantiles by default
    loss=QuantileLoss(),
    log_interval=5,  # uncomment for learning rate finder and otherwise, e.g. to 10 for logging every 10 batches
    reduce_on_plateau_patience=4,
)

trainer.fit(
    tft,
    train_dataloader=train_dataloader,
    val_dataloaders=val_dataloader,
)

The predictions on the validation data looks like this:

github issue

We're getting a MAPE of 0.9609 which is pretty bad. Wondering what's the root cause. We've used StandardScaler() from sklearn on the target variable to squish the range and normalise it, but it's not helping.

georgeblck commented 3 years ago

Are you sure about the option limit_train_batches=90 in your call to the pytorch lightning trainer? As far as I know it's only for testing purposes.

sayanb-7c6 commented 3 years ago

@georgeblck Thank you, I totally missed that. Will update here, if removing that line improves the accuracy of the model.

sktime / pytorch-forecasting