nomadkaraoke / python-audio-separator

Easy to use vocal separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
MIT License
391 stars 64 forks source link

Error writing big Audio files #53

Open iampickle opened 5 months ago

iampickle commented 5 months ago

input file

format: mp4 size: 7.8G

error message

2024-03-16 15:31:02,587 - INFO - separator - Separator version 0.16.2 instantiating with output_dir: None, output_format: mp3 2024-03-16 15:31:02,587 - INFO - separator - Operating System: Linux #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-6 (2023-11-29T08:32Z) 2024-03-16 15:31:02,588 - INFO - separator - System: Linux Node: tbot Release: 6.5.11-6-pve Machine: x86_64 Proc: 2024-03-16 15:31:02,588 - INFO - separator - Python Version: 3.11.4 2024-03-16 15:31:02,588 - INFO - separator - PyTorch Version: 2.2.0+cu121 2024-03-16 15:31:02,680 - INFO - separator - FFmpeg installed: ffmpeg version 5.1.4-0+deb12u1 Copyright (c) 2000-2023 the FFmpeg developers 2024-03-16 15:31:02,681 - INFO - separator - ONNX Runtime GPU package installed with version: 1.17.1 2024-03-16 15:31:02,731 - INFO - separator - CUDA is available in Torch, setting Torch device to CUDA 2024-03-16 15:31:02,731 - INFO - separator - ONNXruntime has CUDAExecutionProvider available, enabling acceleration 2024-03-16 15:31:02,732 - INFO - separator - Loading model UVR-MDX-NET-Inst_HQ_3.onnx... 2024-03-16 15:31:04,463 - INFO - separator - Load model duration: 00:00:01 2024-03-16 15:31:04,463 - INFO - separator - Starting separation process for audio_file_path: /media/raid/twitch/papaplatte/papaplatte-stream-2024-02-01/16.50.mp4 36%|████████████████████████████████████████████ | 2356/6573 [19:33<33:02, 2.13it/s]100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6573/6573 [52:38<00:00, 2.08it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5030/5030 [06:56<00:00, 12.08it/s] 2024-03-16 16:34:56,926 - INFO - mdxseparator - Saving Vocals stem to 16.50(Vocals)_UVR-MDX-NET-Inst_HQ_3.mp3... Traceback (most recent call last): File "/home/tbot/twitchbot/test.py", line 50, in output_files = separator.separate('/media/raid/twitch/papaplatte/papaplatte-stream-2024-02-01/16.50.mp4') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/site-packages/audio_separator/separator/separator.py", line 660, in separate output_files = self.model_instance.separate(audio_file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/site-packages/audio_separator/separator/architectures/mdx_separator.py", line 181, in separate self.final_process(self.secondary_stem_output_path, self.secondary_source, self.secondary_stem_name) File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/site-packages/audio_separator/separator/common_separator.py", line 118, in final_process self.write_audio(stem_path, source) File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/site-packages/audio_separator/separator/common_separator.py", line 255, in write_audio audio_segment.export(stem_path, format=file_format) File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/site-packages/pydub/audio_segment.py", line 895, in export wave_data.writeframesraw(pcm_for_wav) File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 547, in writeframesraw self._ensure_header_written(len(data)) File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 588, in _ensure_header_written self._write_header(datasize) File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 600, in _write_header self._file.write(struct.pack('<L4s4sLHHLLHH4s', ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ struct.error: 'L' format requires 0 <= number <= 4294967295 Exception ignored in: <function Wave_write.del at 0x7f7352a5af20> Traceback (most recent call last): File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 447, in del self.close() File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 565, in close self._ensure_header_written(0) File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 588, in _ensure_header_written self._write_header(datasize) File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 600, in _write_header self._file.write(struct.pack('<L4s4sLHHLLHH4s', ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ struct.error: 'L' format requires 0 <= number <= 4294967295 self.close() File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 565, in close self._ensure_header_written(0) File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 588, in _ensure_header_written self._write_header(datasize) File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 600, in _write_header self._file.write(struct.pack('<L4s4sLHHLLHH4s', ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ struct.error: 'L' format requires 0 <= number <= 4294967295

iampickle commented 5 months ago

quick chat with openai and using soundfile libary instead of pydub:

 def write_audio(self, stem_path: str, stem_source):
        """
        Writes the separated audio source to a file.
        """
        self.logger.debug(f"Entering write_audio with stem_path: {stem_path}")

        stem_source = spec_utils.normalize(wave=stem_source, max_peak=self.normalization_threshold)

        # Check if the numpy array is empty or contains very low values
        if np.max(np.abs(stem_source)) < 1e-6:
            self.logger.warning("Warning: stem_source array is near-silent or empty.")
            return

        # If output_dir is specified, create it and join it with stem_path
        if self.output_dir:
            os.makedirs(self.output_dir, exist_ok=True)
            stem_path = os.path.join(self.output_dir, stem_path)

        self.logger.debug(f"Audio data shape before processing: {stem_source.shape}")
        self.logger.debug(f"Data type before conversion: {stem_source.dtype}")

        # Ensure the audio data is in the correct format (e.g., int16)
        if stem_source.dtype != np.int16:
            stem_source = (stem_source * 32767).astype(np.int16)
            self.logger.debug("Converted stem_source to int16.")

        # Correctly interleave stereo channels if needed
        if stem_source.shape[1] == 2:
            # If the audio is already interleaved, ensure it's in the correct order
            if stem_source.flags['F_CONTIGUOUS']:  # Check if the array is Fortran contiguous (column-major)
                stem_source = np.ascontiguousarray(stem_source)  # Convert to C contiguous (row-major)
            # Otherwise, perform interleaving
            else:
                stereo_interleaved = np.empty((2 * stem_source.shape[0],), dtype=np.int16)
                stereo_interleaved[0::2] = stem_source[:, 0]  # Left channel
                stereo_interleaved[1::2] = stem_source[:, 1]  # Right channel
                stem_source = stereo_interleaved

        self.logger.debug(f"Interleaved audio data shape: {stem_source.shape}")
beveradb commented 5 months ago

Are you saying you've managed to resolve the issue?

If so, nice one! Please raise a PR with the fix! (and ideally a test case, if you're willing) 🙇