sigsep / open-unmix-pytorch

Open-Unmix - Music Source Separation for PyTorch
https://sigsep.github.io/open-unmix/
MIT License
1.27k stars 191 forks source link

Cuda Out of Memory Error on Longer Files #13

Closed RomanScott closed 5 years ago

RomanScott commented 5 years ago

🐛 Bug

Hello,

I am trying to test out the torchfilters branch of this project. It works fine on shorter audio clips, but when the audio file is around 4 to 5 minutes in length, the program crashes with a CudaOutOfMemoryError.

To Reproduce

Steps to reproduce the behavior:

  1. Run test.py on a music file about 4 or 5 minutes in length.
Traceback (most recent call last):
  File "/home/user/unmix/test.py", line 74, in separate
    estimates, model_rate = separator(audio_torch, rate)
  File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/unmix/unmix/filtering.py", line 833, in forward
    for sample in range(nb_samples)], dim=0)
  File "/home/user/unmix/filtering.py", line 833, in <listcomp>
    for sample in range(nb_samples)], dim=0)
  File "/home/user/anaconda3/lib/python3.7/site-packages/torchaudio/functional.py", line 130, in istft
    onesided, signal_sizes=(n_fft,))  # size (channel, n_frames, n_fft)
RuntimeError: CUDA out of memory. Tried to allocate 454.00 MiB (GPU 0; 7.43 GiB total capacity; 6.02 GiB already allocated; 218.94 MiB free; 690.49 MiB cached)

Expected behavior

The program should finish execution on files of longer length as well. Is there a way to split the audio every one or two minutes, or use an audio loader in such a way that the entire song isn't loaded into CUDA memory at once, so that way it doesn't crash?

Thank you!

Environment

Please add some information about your environment

Additional context

faroit commented 5 years ago

dev answer: torchfilter is a work-in-progress branch. Please only report issues on the master branch.

researcher answer: chunking can be implemented but would decrease performance for the BLSTM model. Instead you would be better off to retrain a model using --unidirectional which offers realtime capabilites.

aliutkus commented 5 years ago

please note that I don't think there is a way that it will work on full tracks with a 6GB GPU. I am testing with 12GB and it's all right (up to roughly 6-7mn). Although we are doing our best to optimize memory usage, these complex double spectrograms do take a lot of RAM and there's nothing to do about it.

Unless you do learn an online model with a unidirectional LSTM as suggested by @faroit

aadibajpai commented 5 years ago

Does a unidirectional model reduce performance?

faroit commented 5 years ago

Does a unidirectional model reduce performance?

yes, for vocals it might be up to 0.5 dB SDR. For drums or bass its not that important, though.

aadibajpai commented 5 years ago

Does a unidirectional model reduce performance?

yes, for vocals it might be up to 0.5 dB SDR. For drums or bass its not that important, though.

I see, I'm mainly working with vocals and the master branch works even w/o cuda so no problem so far.