"RuntimeError: CUDA out of memory" when attempting inference

Hello, I was trying to train my own model with this algorithm but I ran across a problem when trying to use inference with this self trained model: RuntimeError: CUDA out of memory. Tried to allocate 2.32 GiB (GPU 0; 10.92 GiB total capacity; 8.45 GiB already allocated; 1.80 GiB free; 8.48 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON

It seems to be a cuda memory problem. First thinking I might not have enough memory on my personal GPU, I tried my university server which hosts 8 GPU's (the earlier log is from that server), which ran into the same problem. A google Colab Pro GPU with 15GB of vram also reported this same problem.

I've tried setting different batch sizes in params.py to see if that would solve the problem, but I can't find any place in the model.py or inference.py that this gets called and it also doesn't seem to affect the memory usage.

I've tried adding torch.cuda.empty_cache() in multiple places to see if that could help, but sadly it didn't.

So far in the code I can't find anything that would cause this problem. Does anyone else experience the same problem, or is there a solution or setting I'm not seeing?

Could this perhaps be a training problem as well, which means I should train in a lower batch size to keep the inference easier?

I'll add my parameters as well, just in case that's of any help:

params = AttrDict(
    # Training params
    batch_size=16,
    learning_rate=2e-4,
    max_grad_norm=None,

    # Data params
    sample_rate=22050,
    n_mels=80,
    n_fft=1024,
    hop_samples=256,
    crop_mel_frames=62,  # Probably an error in paper.

    # Model params
    residual_layers=30,
    residual_channels=64,
    dilation_cycle_length=10,
    unconditional = False,
    noise_schedule=np.linspace(1e-4, 0.05, 50).tolist(),
    inference_noise_schedule=[0.0001, 0.001, 0.01, 0.05, 0.2, 0.5],

    # unconditional sample len
    audio_len = 22050*5, # unconditional_synthesis_samples
)

lmnt-com / diffwave

"RuntimeError: CUDA out of memory" when attempting inference #41