resemble-ai / Resemblyzer

A python package to analyze and compare voices with deep learning
Apache License 2.0
2.67k stars 419 forks source link

Hign GPU memory requirement #14

Closed rohithkodali closed 4 years ago

rohithkodali commented 4 years ago

I Have tried to load the model on gtx 1080 GPU and run it but it is asking for a whole lot of memory this is the error it throws

Traceback (most recent call last): File "/home/server/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 1, in runfile('/media/server/b92be869-bd56-4ed4-9306-12a754f7065f/diarization-package/Resemblyzer/demo02_diarization.py', wdir='/media/server/b92be869-bd56-4ed4-9306-12a754f7065f/diarization-package/Resemblyzer') File "/home/server/pycharm-community-2019.2.4/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile pydev_imports.execfile(filename, global_vars, local_vars) # execute the script File "/home/server/pycharm-community-2019.2.4/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/media/server/b92be869-bd56-4ed4-9306-12a754f7065f/diarization-package/Resemblyzer/demo02_diarization.py", line 64, in run() File "/media/server/b92be869-bd56-4ed4-9306-12a754f7065f/diarization-package/Resemblyzer/demo02diarization.py", line 46, in run , cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16) File "/media/server/b92be869-bd56-4ed4-9306-12a754f7065f/diarization-package/Resemblyzer/resemblyzer/voice_encoder.py", line 152, in embed_utterance partial_embeds = self(mels).cpu().numpy() File "/home/server/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, *kwargs) File "/media/server/b92be869-bd56-4ed4-9306-12a754f7065f/diarization-package/Resemblyzer/resemblyzer/voiceencoder.py", line 57, in forward , (hidden, _) = self.lstm(mels) File "/home/server/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(input, **kwargs) File "/home/server/anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 564, in forward return self.forward_tensor(input, hx) File "/home/server/anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 543, in forward_tensor output, hidden = self.forward_impl(input, hx, batch_sizes, max_batch_size, sorted_indices) File "/home/server/anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 526, in forward_impl self.dropout, self.training, self.bidirectional, self.batch_first) RuntimeError: CUDA out of memory. Tried to allocate 27.50 GiB (GPU 0; 7.93 GiB total capacity; 4.17 GiB already allocated; 3.24 GiB free; 22.08 MiB cached)

CorentinJ commented 4 years ago

Yes, that's why is it run on CPU by default. There needs to be a function for batching the data for inference and it isn't there yet, so that's why.

rohithkodali commented 4 years ago

okay thanks for the info, when you mentioned i README that you have tried on 1080 GPU i thought it was supported.

CorentinJ commented 4 years ago

It is for other demos but not this one, because there is a lot of data to process at once. It is technically possible to run it on GPU but I didn't write the code for it.

bharat-patidar commented 4 years ago

I was trying to run diarization script on 50 minutes of audio and it consumes ~15 GB RAM of my laptop. Is there any way to cut short my RAM usage or any alternative to avoid OOM error for long audios?

CorentinJ commented 4 years ago

Yes, you should batch your data. Proceed by chunks of say 30 seconds, where you compute the speaker embeddings for each and only retain an array indicating which speaker is currently talking. Discard the speaker embeddings and repeat for the next batch. You can do this on GPU for a great speed gain.

milind-soni commented 2 years ago

can you let me know how to batch the data?