GPU memory allocation - Githubissues

jik876 / hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

MIT License

1.92k stars 506 forks source link

GPU memory allocation #87

Closed rspiewak47 closed 3 years ago

rspiewak47 commented 3 years ago

I'm getting this error on a relatively low-powered system I'm using for testing: RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 2.80 GiB already allocated; 17.10 MiB free; 2.82 GiB reserved in total by PyTorch) Where can I set this to a lower value for this purpose? Also, if I try to use fine_tuning (using mel .npy files created by fastspeech2 pre-processing) I get errors indicating a tensor dimension mismatch. Are these files incompatible with hifi-gan? Thank you!

Miralan commented 3 years ago

I'm getting this error on a relatively low-powered system I'm using for testing: RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 2.80 GiB already allocated; 17.10 MiB free; 2.82 GiB reserved in total by PyTorch) Where can I set this to a lower value for this purpose? Also, if I try to use fine_tuning (using mel .npy files created by fastspeech2 pre-processing) I get errors indicating a tensor dimension mismatch. Are these files incompatible with hifi-gan? Thank you!

batch_size or segment length. I prefer to recommand reducing the batch_size.

rspiewak47 commented 3 years ago

Thanks! I reduced batch_size to 4, wasn't enough. Also reduced segment_size to 4096, that seems to have done it. Thanks!

rspiewak47 commented 3 years ago

Well, this only worked for a small train.txt file. Once I tried using the target dataset (500K plus entries) the memory allocation message came right back. I tried setting the pin_memory parameter in the DataLoader calls to False, but this had no effect. Any more ideas? Thanks! This seems to be sensitive to the size of the training.txt file. I'm running it successfully with 10,000 lines.

rspiewak47 commented 3 years ago

Well, now it's gone back downhill, running out of GPU memory pretty much no matter what.

Miralan commented 3 years ago

try remove non_blocking=True in train.py

rspiewak47 commented 3 years ago

I changed this to False in all locations. Now the error is: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 4.00 GiB total capacity; 2.78 GiB already allocated; 1.10 MiB free; 2.83 GiB reserved in total by PyTorch)

rspiewak47 commented 3 years ago

This appears to be a problem with a limited resource configuration. It's running fine on an Azure VM with a bigger GPU.