karaokenerds / python-audio-separator

Easy to use vocal separation from CLI or as a python package, using a variety of amazing models (primarily trained by @Anjok07 as part of UVR)
MIT License
311 stars 53 forks source link

Randomly raising "librosa ParameterError: Audio buffer is not finite everywhere (cutting wav files)" 1 out 5 generations #57

Open smartinezbragado opened 3 months ago

smartinezbragado commented 3 months ago

Hi, I deployed the package as part of Serverless Runpod Endpoint, and randomly returns silent vocals. When this happens it is raising a "librosa ParameterError: Audio buffer is not finite everywhere (cutting wav files)" on vr_separator.py - spec_to_wav() line 317.

It is completely random, sometimes the vocal separation is perfect, and other times it is silent and raises the error. Do you know how to fix this.? I tried to fork the repo, and convert the spec infinite values to 0, but did not work.

beveradb commented 3 months ago

Are you able to share some more of the debug logs (add -d to your CLI usage, or pass log_level=logging.DEBUG to your Separator class instantiation) around the time when that error occurs?

I have no idea what's going on off the top of my head but I'm willing to try and help you debug it!

Have you been able to reproduce in any other environment? Do you have a test file which this seems to happen more frequently with or anything? I'm not familiar with runpod's serverless setup but do you have any other system metrics which might be worth looking for potential correlations in? e.g. perhaps this is occurring when something runs out of memory or gets throttled or something

smartinezbragado commented 3 months ago

Sure, here they are:

"message": "Audio buffer is not finite everywhere",
"error_type": "<class 'librosa.util.exceptions.ParameterError'>",
"error_message": "Audio buffer is not finite everywhere",
"error_traceback": "Traceback (most recent call last):
File \"/usr/local/lib/python3.11/site-packages/runpod/serverless/modules/rp_job.py\", line 134, in run_job
handler_return = handler(job)
File \"/rvc/handler.py\", line 68, in handler
sep_vocals_path, sep_instruments_path = separate_vocals(separator, song_path)
File \"/rvc/runpod_utils.py\", line 171, in separate_vocals
files = separator.separate(song_path)
File \"/usr/local/lib/python3.11/site-packages/audio_separator/separator/separator.py\", line 660, in separate
output_files = self.model_instance.separate(audio_file_path)
File \"/usr/local/lib/python3.11/site-packages/audio_separator/separator/architectures/vr_separator.py\", line 188, in separate
self.secondary_source = self.spec_to_wav(v_spec).T
File \"/usr/local/lib/python3.11/site-packages/audio_separator/separator/architectures/vr_separator.py\", line 324, in spec_to_wav
wav = spec_utils.cmb_spectrogram_to_wave(spec, self.model_params, is_v51model=self.is_vr51model) File \"/usr/local/lib/python3.11/site-packages/audio_separator/separator/uvr_lib_v5/spec_utils.py\", line 380, in cmb_spectrogram_to_wave wave = librosa.resample(wave2, orig_sr=bp[\"sr\"], target_sr=sr, res_type=wav_resolution) File \"/usr/local/lib/python3.11/site-packages/librosa/core/audio.py\", line 627, in resample util.valid_audio(y, mono=False) File \"/usr/local/lib/python3.11/site-packages/librosa/util/utils.py\", line 314, in valid_audio raise ParameterError(\"Audio buffer is not finite everywhere\") ", "hostname": "0jef9ikahtlyrx-64410b30", "worker_id": "0jef9ikahtlyrx", "runpod_version": "1.6.2", "level": "ERROR"}

I forked your repo and replaced the infinite and nan values in the spectogram for 0, but it still fails, returning silent vocals.

It is very weird because this issue is raised randomly, so is difficult to understand what is happening.

Rundpod works with docker images, that are launched inside of GPUs. Do you think the issue might be related with the environment / resources of the GPU? If I run multiple executions in a row, do I need to reset the GPU cache or something?

I just tried it in a jupyter notebook, and it worked properly. Therefore I was not able to reproduce the error in other environment.

Thanks!!

beveradb commented 3 months ago

I was hoping you'd share the full debug logs, not just the stack trace 😅

The reason I say that is because I have very little information to work with here to try and help you, so everything is potentially worth looking at! One of the reasons I want to see the debug log is to know what parameters you're running audio-separator with, in the hope of being able to reproduce myself...

Since you can't reproduce the issue consistently, I was going to look through the full debug logs and try to figure out what exactly the VR separator code is doing at the points when it gets into this situation.

Ideally, sharing the full debug logs from a successful separation, vs. the full debug logs from an unsuccessful run would be helpful - even more helpful if you manage to get a success and a failure on the same input file!

Also, I'm familiar with Runpod's GPU cloud (https://www.runpod.io/console/gpu-cloud) - I actually have an audio-separator template there, which I use myself when I need to test CUDA support! However I've never used their serverless environment.

Please could you try running on one of the regular Runpod pods (not serverless) and see if you're able to reproduce there? I've run a lot of separations on regular GPU pods on Runpod and haven't encountered this same issue.

If you're able to reproduce there on a regular Runpod instance, please give me all of the specifics, e.g.

If you aren't able to reproduce in a regular runpod, but you are able to reproduce in the runpod serverless endpoint, please give me details of how you set up that serverless endpoint so I can try to reproduce myself, e.g.

image image

At least with that info I should be able to reproduce myself, which would make it a lot easier for me to try and help get to the bottom of the cause!

smartinezbragado commented 3 months ago

I was able to reproduce it in the runpod cloud pods. I just ran it 20 times in a row, and it failed in the 10th separation with the same issue.

Template: RunPod Pytorch 2.1 GPU: A40

Code snippet: `from audio_separator.separator import Separator

separator = Separator()

separator.load_model("2_HP-UVR.pth")

for _ in range(30): separator.separate('song.mp3')`

The error raised is the same: `ParameterError Traceback (most recent call last) Cell In[9], line 3 1 # Perform the separation on specific audio files without reloading the model 2 for _ in range(30): ----> 3 separator.separate('song.mp3')

File /usr/local/lib/python3.10/dist-packages/audio_separator/separator/separator.py:660, in Separator.separate(self, audio_file_path) 657 self.logger.debug(f"Normalization threshold set to {self.normalization_threshold}, waveform will lowered to this max amplitude to avoid clipping.") 659 # Run separation method for the loaded model --> 660 output_files = self.model_instance.separate(audio_file_path) 662 # Clear GPU cache to free up memory 663 self.model_instance.clear_gpu_cache()

File /usr/local/lib/python3.10/dist-packages/audio_separator/separator/architectures/vr_separator.py:203, in VRSeparator.separate(self, audio_file_path) 200 if not isinstance(self.secondary_source, np.ndarray): 201 self.logger.debug(f"Preparing to convert spectrogram to waveform. Spec shape: {v_spec.shape}") --> 203 self.secondary_source = self.spec_to_wav(v_spec).T 204 self.logger.debug("Converting secondary source spectrogram to waveform.") 205 if not self.model_samplerate == 44100:

File /usr/local/lib/python3.10/dist-packages/audio_separator/separator/architectures/vr_separator.py:339, in VRSeparator.spec_to_wav(self, spec) 337 wav = spec_utils.cmb_spectrogram_to_wave(spec, self.model_params, self.input_high_end_h, input_highend, is_v51_model=self.is_vr_51_model) 338 else: --> 339 wav = spec_utils.cmb_spectrogram_to_wave(spec, self.model_params, is_v51_model=self.is_vr_51_model) 341 return wav

File /usr/local/lib/python3.10/dist-packages/audio_separator/separator/uvr_lib_v5/spec_utils.py:388, in cmb_spectrogram_to_wave(spec_m, mp, extra_bins_h, extra_bins, is_v51_model) 385 wave2 = np.add(wave, spectrogram_to_wave(spec_s, bp["hl"], mp, d, is_v51_model)) 387 try: --> 388 wave = librosa.resample(wave2, orig_sr=bp["sr"], target_sr=sr, res_type=wav_resolution) 389 except ValueError as e: 390 print(f"Error during resampling: {e}")

File /usr/local/lib/python3.10/dist-packages/librosa/core/audio.py:627, in resample(y, orig_sr, target_sr, res_type, fix, scale, axis, **kwargs) 524 """Resample a time series from orig_sr to target_sr 525 526 By default, this uses a high-quality method (soxr_hq) for band-limited sinc (...) 624 ((117601,), (42668,)) 625 """ 626 # First, validate the audio buffer --> 627 util.valid_audio(y, mono=False) 629 if orig_sr == target_sr: 630 return y

File /usr/local/lib/python3.10/dist-packages/librosa/util/utils.py:314, in valid_audio(y, mono) 309 raise ParameterError( 310 f"Invalid shape for monophonic audio: ndim={y.ndim:d}, shape={y.shape}" 311 ) 313 if not np.isfinite(y).all(): --> 314 raise ParameterError("Audio buffer is not finite everywhere") 316 return True

ParameterError: Audio buffer is not finite everywhere`

beveradb commented 3 months ago

Thanks for confirming @smartinezbragado ! That's really helpful, I can test on a runpod with the same GPU and use the same 2_HP-UVR.pth model to try and reproduce myself.

However, I think I might actually have already implemented a workaround for the issue, even if I don't exactly understand the cause:

Please can you upgrade to v0.16.3 and try to reproduce again, or see if that's fixed the issue for you?

smartinezbragado commented 3 months ago

Thanks for the effort @beveradb !! However, I repeated the experiment with the new version and the error is still being raised. The link to the song I am using to test it is this one (github issues does not allow to attach audio files): https://www.youtube.com/watch?v=dOD0Aa5KNGs&ab_channel=MonkTurner

ANDYVDL commented 3 months ago

I'm experiencing the same issue on RunPod on some audios

beveradb commented 3 months ago

Which model and parameters @ANDYVDL? And, are you able to share a test file for me to reproduce?

smartinezbragado commented 3 months ago

Here is a download link to my song, but I think it happens with all songs if you iterate the separation on the same machine: https://www.dropbox.com/scl/fi/3eguhe0rjy8pyukqijbvu/Birthday.mp3?rlkey=v0g4itglmty3qb7bb3xyn42kx&dl=0

ANDYVDL commented 3 months ago

Which model and parameters @ANDYVDL? And, are you able to share a test file for me to reproduce?

Sorry for the late reply, as I was on a family holiday.

I never found a way to reproduce it, but at least it now seems like it is related to the input file. I tested it with the latest version 0.16.4 and the "5_HP-UVR" model, which is the standard recommendation of the RVC project (https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI). On RunPod i am using the 4090 24GB GPU Pro serveless environment.

The issue seems to never happen if the RunPod worker gets initialized, only if it stays on and multiple separations are done either via the next scheduled job or multiple separation calls in the same job.

With the following audio (removed link) I never ran into the issue after 30+ tries.

With the Birthday.mp3 from @smartinezbragado (Dropbox link above) and a few others i use it happens frequently. I also encountered that sometimes the "Audio buffer is not finite everywhere" error doens't get thrown but the audio is just silent (removed link)!

ANDYVDL commented 3 months ago

Not a RunPod issue, just had for the first time using the UVR GUI. The issue is also reported in the UVR repo, so probably upstream to this codebase

beveradb commented 1 month ago

I'm afraid I've never experienced this myself and still can't reproduce :/

If you can figure out what the cause is, I'd of course appreciate a PR with a fix 🙇

beveradb commented 1 month ago

I should ask - does it do this with any other model architecture?

I would encourage you to check out the new RoFormer models (e.g. model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt is what I've now set as the new default model as it's just so impressive in my experience)

I'd be surprised if they exhibit the same bug, and they probably provide better separation than what you've been getting! 👀