GPU Runs Out of Memory for TFT When Doing Inference

YojoNick commented 2 years ago

PyTorch-Forecasting version: 0.9.0 PyTorch version: 1.9.0+cu111 Python version: 3.6.9 Operating System: Ubuntu 18.04

Expected behavior

When executing predict() for the Temporal Fusion Transformer on a dataset, I expected the GPU memory utilization would be static (i.e. not growing over time).

Actual behavior

However, when doing inference on a dataset, using the Temporal Fusion Transformer, the GPU memory utilization would grow overtime until I received an out of GPU memory error.

Code to reproduce the problem

tft = TemporalFusionTransformer.load_from_checkpoint(checkPointFile)
tft.to('cuda')
raw_predictions, inputs = tft.predict(testSetDataLoader, mode="raw", return_x=True, show_progress_bar=True)

I tried setting the number of workers for the data loader to 4 and then 0, both still give me an out of memory issue. I also tried decreasing the batch size.

If I reduce the size of the test set Time Series Dataset, then I don't get an out of memory issue. However, because I'm using a data loader with a small mini batch size, I would expect that I wouldn't run out of memory given I'm doing inference one mini batch at a time...

moeiniamir commented 2 years ago

@YojoNick same thing happens to me after I override create_log method of TemporalFusionTransformer.
Is your TemporalFusionTransformer modified too?

hschmiedt commented 2 years ago

Any suggestions to solve this issue? I do get the same problem when predicting on a large dataset.

jdb78 commented 2 years ago

You are storing the entire output in memory which can become very big, particularly in "raw" mode. Depending on your usecase, you might want to write it to disk instead.

oliverester commented 1 year ago

I think this is a little bug happening when predicting in raw mode.

In pytorch_forecasting/utils.py , the move_to_device function should be changed to:

elif isinstance(x, OutputMixIn):
        x = x.__class__(**{name: move_to_device(xi, device=device) for name, xi in x.items()})
        return x

Currently, the CPU-moved tensors are not assigned and therefore stay on the GPU.

jdb78 / pytorch-forecasting