[BUG] RuntimeError when Running TransformerModel on Multiple GPUs with darts 0.25.0

PANXIONG-CN commented 1 year ago

Describe the bug

When I attempt to run the TransformerModel using multiple GPUs, I encounter the following error:

RuntimeError: unsupported operation: some elements of the input tensor and the written-to tensor refer to a single memory location. Please clone() the tensor before performing the operation.

To Reproduce

Below is the code snippet to reproduce the issue:

if __name__ == '__main__':

    torch.multiprocessing.freeze_support()

    series = AirPassengersDataset().load().astype(np.float32)

    # Create training and validation sets:
    train, val = series.split_after(pd.Timestamp("19590101"))

    # Normalize the time series (note: we avoid fitting the transformer on the validation set)
    scaler = Scaler()
    train_scaled = scaler.fit_transform(train)
    val_scaled = scaler.transform(val)
    series_scaled = scaler.transform(series)

    my_model = TransformerModel(
        input_chunk_length=12,
        output_chunk_length=1,
        batch_size=32,
        n_epochs=200,
        model_name="air_transformer",
        nr_epochs_val_period=10,
        d_model=16,
        nhead=8,
        num_encoder_layers=2,
        num_decoder_layers=2,
        dim_feedforward=128,
        dropout=0.1,
        activation="relu",
        random_state=42,
        save_checkpoints=True,
        force_reset=True,
        pl_trainer_kwargs = {"accelerator": "gpu", "devices":[0,1],"strategy": "ddp" },
    )

    my_model.fit(series=train_scaled, val_series=val_scaled, verbose=True)

    def eval_model(model, n, series, val_series):
        pred_series = model.predict(n=n)
        plt.figure(figsize=(8, 5))
        series.plot(label="actual")
        pred_series.plot(label="forecast")
        plt.title("MAPE: {:.2f}%".format(mape(pred_series, val_series)))
        plt.legend()

    eval_model(my_model, 26, series_scaled, val_scaled)

    best_model = TransformerModel.load_from_checkpoint(
        model_name="air_transformer", best=True
    )
    eval_model(best_model, 26, series_scaled, val_scaled)

    backtest_series = my_model.historical_forecasts(
        series=series_scaled,
        start=pd.Timestamp("19590101"),
        forecast_horizon=6,
        retrain=False,
        verbose=True,
    )

Expected behavior

I expect the model to run using multiple GPUs without any issues.

System (please complete the following information):

Python version: 3.9
darts version: 0.25.0

Additional context

The model runs correctly when using a single GPU.

PANXIONG-CN commented 1 year ago

Updated Issue: Using `strategy: "auto"` with Multiple GPUs

I made an update to the GPU configuration part of the code, setting the strategy to "auto" as shown below:

pl_trainer_kwargs = {"accelerator": "gpu", "strategy": "auto" }

Unfortunately, I still encountered the same RuntimeError:

RuntimeError: unsupported operation: some elements of the input tensor and the written-to tensor refer to a single memory location. Please clone() the tensor before performing the operation.

Updated To Reproduce

The relevant code with the new configuration is:

...

my_model = TransformerModel(
    ...
    pl_trainer_kwargs = {"accelerator": "gpu", "strategy": "auto" },
)

my_model.fit(series=train_scaled, val_series=val_scaled, verbose=True)

...

Expected behavior

I expected the "auto" strategy to adapt to the available GPUs and run without the RuntimeError.

System (please complete the following information):

Python version: 3.9
darts version: 0.25.0

Additional context

Again, using a single GPU works without any problems. It seems that when trying to leverage multiple GPUs, whether using "ddp" or "auto" strategy, the same error arises.

PANXIONG-CN commented 1 year ago

Successful Multi-GPU Execution with `TFTModel`

I tried a different approach and used TFTModel from the darts library. Surprisingly, when implementing this model and configuration, I was able to successfully utilize multiple GPUs without encountering the previous RuntimeError.

Here's the code that worked:

import torch
from darts.models import TFTModel
from darts.datasets import AirPassengersDataset

if __name__ == "__main__": 
    torch.multiprocessing.freeze_support()
    series = AirPassengersDataset().load()
    series = [series] * 100

    model = TFTModel(
        input_chunk_length=12,
        output_chunk_length=6,
        add_relative_index=True,
        pl_trainer_kwargs={"accelerator": "gpu", "devices": "auto"}
    )
    model.fit(series, epochs=10)
    preds = model.predict(n=6, series=series, num_samples=100)
    print("len(preds)",len(preds), "len(series)", len(series))

This leads me to believe that the issue might be specific to the TransformerModel when running on multiple GPUs, given that TFTModel works fine in a similar setup.

System (please complete the following information):

Python version: 3.9
darts version: 0.25.0

Additional context

It would be great to get some insights into why TransformerModel struggles with multi-GPU configurations while TFTModel operates without issues.

unit8co / darts