lmnt-com / diffwave

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Apache License 2.0
767 stars 112 forks source link

"RuntimeError: CUDA out of memory" when attempting inference #41

Closed WouterBesse closed 2 years ago

WouterBesse commented 2 years ago

Hello, I was trying to train my own model with this algorithm but I ran across a problem when trying to use inference with this self trained model: RuntimeError: CUDA out of memory. Tried to allocate 2.32 GiB (GPU 0; 10.92 GiB total capacity; 8.45 GiB already allocated; 1.80 GiB free; 8.48 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON

It seems to be a cuda memory problem. First thinking I might not have enough memory on my personal GPU, I tried my university server which hosts 8 GPU's (the earlier log is from that server), which ran into the same problem. A google Colab Pro GPU with 15GB of vram also reported this same problem.

I've tried setting different batch sizes in params.py to see if that would solve the problem, but I can't find any place in the model.py or inference.py that this gets called and it also doesn't seem to affect the memory usage.

I've tried adding torch.cuda.empty_cache() in multiple places to see if that could help, but sadly it didn't.

So far in the code I can't find anything that would cause this problem. Does anyone else experience the same problem, or is there a solution or setting I'm not seeing?

Could this perhaps be a training problem as well, which means I should train in a lower batch size to keep the inference easier?

I'll add my parameters as well, just in case that's of any help:

params = AttrDict(
    # Training params
    batch_size=16,
    learning_rate=2e-4,
    max_grad_norm=None,

    # Data params
    sample_rate=22050,
    n_mels=80,
    n_fft=1024,
    hop_samples=256,
    crop_mel_frames=62,  # Probably an error in paper.

    # Model params
    residual_layers=30,
    residual_channels=64,
    dilation_cycle_length=10,
    unconditional = False,
    noise_schedule=np.linspace(1e-4, 0.05, 50).tolist(),
    inference_noise_schedule=[0.0001, 0.001, 0.01, 0.05, 0.2, 0.5],

    # unconditional sample len
    audio_len = 22050*5, # unconditional_synthesis_samples
)
WouterBesse commented 2 years ago

Okay, I figured it out for now. I put in the CPU as inference device so it can use my normal RAM. Seems to do the trick. For anybody who wants to do this as well, don't forget to add map_location=device to both of the torch.load() functions in the predict() function from inference.py.