Closed bohning closed 10 months ago
Wow, I totally missed this! Sorry about that. I'll change it to use pydub for writing audio files later this week, which should make it work for anything ffmpeg supports (definitely includes M4A).
In the meantime you could output to WAV or FLAC then run ffmpeg -i <audio-separator output file> output.m4a
Thanks a lot for taking the time to respond and address the issue. I can work with mp3 for the time being, but I am looking forward to m4a support!
Hey @bohning, so sorry for the super slow response on this, day job and life got the better of me and I didn't end up getting back to working on audio-separator for a while.
Anyway, I've just made a bunch of improvements to it and I believe the latest version of audio-separator
(version 0.9.3
or greater) should support every format ffmpeg
supports now (including M4A, which I specifically tested)
Hopefully that works for you, but if you have any issues feel free to re-open!
Thanks so much, I will look into it these days. Happy holidays!
I finally got around testing this and it fails for my m4a. soundfile.py doesn't seem to like the format:
soundfile.LibsndfileError: Error opening '/var/folders/nq/_mxrjgvd73d0_h92kqhcf3y80000gp/T/tmpcvlf8d3p/audio.m4a': Format not recognised.
I know this seems to be a soundfile/Libsndfile issue, but since you wrote that you explicitly tested m4a, maybe you know what could be wrong.
In any case, thanks a lot!
P.S. I opened an issue at the soundfile repository: https://github.com/bastibe/python-soundfile/issues/431
Please could you send me the file to test with @bohning, or share the output from ffprobe
for that file to confirm what format it is?
I just tested again with an M4A file I created using ffmpeg -i test.flac test.m4a
and it worked fine:
(separator) ➜ other ffmpeg -i test-separator/test.flac test.m4a
ffmpeg version 6.1.1 Copyright (c) 2000-2023 the FFmpeg developers
built with clang version 16.0.6
configuration: --prefix=/Users/runner/miniforge3/conda-bld/ffmpeg_1712656609600/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl --cc=arm64-apple-darwin20.0.0-clang --cxx=arm64-apple-darwin20.0.0-clang++ --nm=arm64-apple-darwin20.0.0-nm --ar=arm64-apple-darwin20.0.0-ar --disable-doc --disable-openssl --enable-demuxer=dash --enable-hardcoded-tables --enable-libfreetype --enable-libharfbuzz --enable-libfontconfig --enable-libopenh264 --enable-libdav1d --enable-cross-compile --arch=arm64 --target-os=darwin --cross-prefix=arm64-apple-darwin20.0.0- --host-cc=/Users/runner/miniforge3/conda-bld/ffmpeg_1712656609600/_build_env/bin/x86_64-apple-darwin13.4.0-clang --enable-neon --enable-gnutls --enable-libmp3lame --enable-libvpx --enable-libass --enable-pthreads --enable-libopenvino --enable-gpl --enable-libx264 --enable-libx265 --enable-libaom --enable-libsvtav1 --enable-libxml2 --enable-pic --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libopus --pkg-config=/Users/runner/miniforge3/conda-bld/ffmpeg_1712656609600/_build_env/bin/pkg-config
libavutil 58. 29.100 / 58. 29.100
libavcodec 60. 31.102 / 60. 31.102
libavformat 60. 16.100 / 60. 16.100
libavdevice 60. 3.100 / 60. 3.100
libavfilter 9. 12.100 / 9. 12.100
libswscale 7. 5.100 / 7. 5.100
libswresample 4. 12.100 / 4. 12.100
libpostproc 57. 3.100 / 57. 3.100
Input #0, flac, from 'test-separator/test.flac':
Metadata:
TITLE : Be Calm
ARTIST : Fun.
ALBUM : Aim and Ignite
GENRE : Alternative & Punk
COMPOSER : Fun.
track : 1
TRACKTOTAL : 10
TOTALTRACKS : 10
disc : 1
DISCTOTAL : 1
TOTALDISCS : 1
DATE : 2009
ISRC : CAN110900135
Duration: 00:04:09.81, start: 0.000000, bitrate: 916 kb/s
Stream #0:0: Audio: flac, 44100 Hz, stereo, s16
Stream mapping:
Stream #0:0 -> #0:0 (flac (native) -> aac (native))
Press [q] to stop, [?] for help
Output #0, ipod, to 'test.m4a':
Metadata:
TITLE : Be Calm
ARTIST : Fun.
ALBUM : Aim and Ignite
GENRE : Alternative & Punk
COMPOSER : Fun.
track : 1
TRACKTOTAL : 10
TOTALTRACKS : 10
disc : 1
DISCTOTAL : 1
TOTALDISCS : 1
DATE : 2009
ISRC : CAN110900135
encoder : Lavf60.16.100
Stream #0:0: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s
Metadata:
encoder : Lavc60.31.102 aac
[out#0/ipod @ 0x600002294540] video:0kB audio:3969kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.086439%
size= 4012kB time=00:04:09.80 bitrate= 131.6kbits/s speed= 105x
[aac @ 0x15b608080] Qavg: 693.818
(separator) ➜ other audio-separator test.m4a
2024-05-07 15:01:11.821 - INFO - cli - Separator version 0.16.6 beginning with input file: test.m4a
2024-05-07 15:01:11.822 - INFO - separator - Separator version 0.16.6 instantiating with output_dir: None, output_format: FLAC
2024-05-07 15:01:11.823 - INFO - separator - Operating System: Darwin Darwin Kernel Version 23.4.0: Fri Mar 15 00:10:42 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6000
2024-05-07 15:01:11.862 - INFO - separator - System: Darwin Node: VVN46RXL9Q Release: 23.4.0 Machine: arm64 Proc: arm
2024-05-07 15:01:11.863 - INFO - separator - Python Version: 3.12.3
2024-05-07 15:01:11.863 - INFO - separator - PyTorch Version: 2.3.0
2024-05-07 15:01:11.927 - INFO - separator - FFmpeg installed: ffmpeg version 6.1.1 Copyright (c) 2000-2023 the FFmpeg developers
2024-05-07 15:01:11.930 - INFO - separator - ONNX Runtime CPU package installed with version: 1.17.3
2024-05-07 15:01:11.958 - INFO - separator - Apple Silicon MPS/CoreML is available in Torch, setting Torch device to MPS
2024-05-07 15:01:11.958 - INFO - separator - ONNXruntime has CoreMLExecutionProvider available, enabling acceleration
2024-05-07 15:01:11.958 - INFO - separator - Loading model UVR-MDX-NET-Inst_HQ_3.onnx...
17.2kiB [00:00, 12.1MiB/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 66.8M/66.8M [00:06<00:00, 9.83MiB/s]
4.38kiB [00:00, 3.88MiB/s]
12.0kiB [00:00, 4.04MiB/s]
2024-05-07 15:01:27.408 - INFO - separator - Load model duration: 00:00:15
2024-05-07 15:01:27.408 - INFO - separator - Starting separation process for audio_file_path: test.m4a
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 58/58 [00:57<00:00, 1.01it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44/44 [00:03<00:00, 12.58it/s]
2024-05-07 15:02:29.673 - INFO - mdx_separator - Saving Vocals stem to test_(Vocals)_UVR-MDX-NET-Inst_HQ_3.flac...
2024-05-07 15:02:30.424 - INFO - mdx_separator - Saving Instrumental stem to test_(Instrumental)_UVR-MDX-NET-Inst_HQ_3.flac...
2024-05-07 15:02:31.176 - INFO - common_separator - Clearing input audio file paths, sources and stems...
2024-05-07 15:02:31.179 - INFO - separator - Separation duration: 00:01:03
2024-05-07 15:02:31.179 - INFO - cli - Separation complete! Output file(s): test_(Vocals)_UVR-MDX-NET-Inst_HQ_3.flac test_(Instrumental)_UVR-MDX-NET-Inst_HQ_3.flac
(separator) ➜ other ffprobe test.m4a
ffprobe version 6.1.1 Copyright (c) 2007-2023 the FFmpeg developers
built with clang version 16.0.6
configuration: --prefix=/Users/runner/miniforge3/conda-bld/ffmpeg_1712656609600/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl --cc=arm64-apple-darwin20.0.0-clang --cxx=arm64-apple-darwin20.0.0-clang++ --nm=arm64-apple-darwin20.0.0-nm --ar=arm64-apple-darwin20.0.0-ar --disable-doc --disable-openssl --enable-demuxer=dash --enable-hardcoded-tables --enable-libfreetype --enable-libharfbuzz --enable-libfontconfig --enable-libopenh264 --enable-libdav1d --enable-cross-compile --arch=arm64 --target-os=darwin --cross-prefix=arm64-apple-darwin20.0.0- --host-cc=/Users/runner/miniforge3/conda-bld/ffmpeg_1712656609600/_build_env/bin/x86_64-apple-darwin13.4.0-clang --enable-neon --enable-gnutls --enable-libmp3lame --enable-libvpx --enable-libass --enable-pthreads --enable-libopenvino --enable-gpl --enable-libx264 --enable-libx265 --enable-libaom --enable-libsvtav1 --enable-libxml2 --enable-pic --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libopus --pkg-config=/Users/runner/miniforge3/conda-bld/ffmpeg_1712656609600/_build_env/bin/pkg-config
libavutil 58. 29.100 / 58. 29.100
libavcodec 60. 31.102 / 60. 31.102
libavformat 60. 16.100 / 60. 16.100
libavdevice 60. 3.100 / 60. 3.100
libavfilter 9. 12.100 / 9. 12.100
libswscale 7. 5.100 / 7. 5.100
libswresample 4. 12.100 / 4. 12.100
libpostproc 57. 3.100 / 57. 3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test.m4a':
Metadata:
major_brand : M4A
minor_version : 512
compatible_brands: M4A isomiso2
title : Be Calm
artist : Fun.
composer : Fun.
album : Aim and Ignite
date : 2009
encoder : Lavf60.16.100
genre : Alternative & Punk
track : 1
disc : 1
Duration: 00:04:09.81, start: 0.000000, bitrate: 131 kb/s
Stream #0:0[0x1](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 130 kb/s (default)
Metadata:
handler_name : SoundHandler
vendor_id : [0][0][0][0]
oh, I realize that it's outputting as M4A which you're trying to do, not processing M4A input files, apologies for the previous message probably not being relevant.
However, the same kinda still goes - I'd like to see more detail about how you're running audio-separator (full command, debug log output etc.) and a test input file, as outputting as M4A "works for me":
(separator) ➜ other audio-separator -m UVR-MDX-NET-Inst_HQ_4.onnx --output_format=m4a test30s.flac
2024-05-07 15:55:12.009 - INFO - cli - Separator version 0.16.6 beginning with input file: test30s.flac
2024-05-07 15:55:12.010 - INFO - separator - Separator version 0.16.6 instantiating with output_dir: None, output_format: m4a
2024-05-07 15:55:12.010 - INFO - separator - Operating System: Darwin Darwin Kernel Version 23.4.0: Fri Mar 15 00:10:42 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6000
2024-05-07 15:55:12.044 - INFO - separator - System: Darwin Node: VVN46RXL9Q Release: 23.4.0 Machine: arm64 Proc: arm
2024-05-07 15:55:12.045 - INFO - separator - Python Version: 3.12.3
2024-05-07 15:55:12.045 - INFO - separator - PyTorch Version: 2.3.0
2024-05-07 15:55:12.104 - INFO - separator - FFmpeg installed: ffmpeg version 6.1.1 Copyright (c) 2000-2023 the FFmpeg developers
2024-05-07 15:55:12.106 - INFO - separator - ONNX Runtime CPU package installed with version: 1.17.3
2024-05-07 15:55:12.120 - INFO - separator - Apple Silicon MPS/CoreML is available in Torch, setting Torch device to MPS
2024-05-07 15:55:12.120 - INFO - separator - ONNXruntime has CoreMLExecutionProvider available, enabling acceleration
2024-05-07 15:55:12.120 - INFO - separator - Loading model UVR-MDX-NET-Inst_HQ_4.onnx...
2024-05-07 15:55:15.264 - INFO - separator - Load model duration: 00:00:03
2024-05-07 15:55:15.264 - INFO - separator - Starting separation process for audio_file_path: test30s.flac
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:06<00:00, 1.21it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 13.56it/s]
2024-05-07 15:55:23.046 - INFO - mdx_separator - Saving Vocals stem to test30s_(Vocals)_UVR-MDX-NET-Inst_HQ_4.m4a...
2024-05-07 15:55:23.629 - INFO - mdx_separator - Saving Instrumental stem to test30s_(Instrumental)_UVR-MDX-NET-Inst_HQ_4.m4a...
2024-05-07 15:55:24.231 - INFO - common_separator - Clearing input audio file paths, sources and stems...
2024-05-07 15:55:24.232 - INFO - separator - Separation duration: 00:00:08
2024-05-07 15:55:24.232 - INFO - cli - Separation complete! Output file(s): test30s_(Vocals)_UVR-MDX-NET-Inst_HQ_4.m4a test30s_(Instrumental)_UVR-MDX-NET-Inst_HQ_4.m4a
Actually I want both - process an m4a input file and write the two separate tracks as m4a output files.
I'm using it in my Python program, installed via pipenv install
.
Here's the relevant code snippet (ext = "m4a" in this case):
from audio_separator.separator import Separator
...
separator = Separator(output_format=ext.upper(), output_dir=ctx.locations.temp_path())
separator.load_model(model_filename="UVR-MDX-NET-Inst_HQ_3.onnx")
vocals_file, instrumental_file = separator.separate(audio)
I do get tons of output, which ends with
2024-05-07 22:20:09 [DEBUG] on stmt: jump 441
2024-05-07 22:20:09 [DEBUG] ==== SSA block rewrite pass on 441
2024-05-07 22:20:09 [DEBUG] Running <numba.core.ssa._FixSSAVars object at 0x3ab91c750>
2024-05-07 22:20:09 [DEBUG] on stmt: jump 230
2024-05-07 22:20:12 [DEBUG] #28939: Traceback (most recent call last):
File "/Users/markus/Projects/usdb_syncer/.venv/lib/python3.11/site-packages/librosa/core/audio.py", line 175, in load
y, sr_native = __soundfile_load(path, offset, duration, dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/markus/Projects/usdb_syncer/.venv/lib/python3.11/site-packages/librosa/core/audio.py", line 208, in __soundfile_load
context = sf.SoundFile(path)
^^^^^^^^^^^^^^^^^^
File "/Users/markus/Projects/usdb_syncer/.venv/lib/python3.11/site-packages/soundfile.py", line 658, in __init__
self._file = self._open(file, mode_int, closefd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/markus/Projects/usdb_syncer/.venv/lib/python3.11/site-packages/soundfile.py", line 1216, in _open
raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening '/var/folders/x3/syw554216nncfrxrpkcwyfg40000gn/T/tmp4lxy54cc/Édith Piaf - La vie en rose.m4a': Format not recognised.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/markus/Projects/usdb_syncer/src/usdb_syncer/song_loader.py", line 326, in run
self.song = self._run_inner()
^^^^^^^^^^^^^^^^^
File "/Users/markus/Projects/usdb_syncer/src/usdb_syncer/song_loader.py", line 372, in _run_inner
job(ctx)
File "/Users/markus/Projects/usdb_syncer/src/usdb_syncer/song_loader.py", line 633, in _maybe_generate_instrumental
vocals_file, instrumental_file = separator.separate(audio)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/markus/Projects/usdb_syncer/.venv/lib/python3.11/site-packages/audio_separator/separator/separator.py", line 666, in separate
output_files = self.model_instance.separate(audio_file_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/markus/Projects/usdb_syncer/.venv/lib/python3.11/site-packages/audio_separator/separator/architectures/mdx_separator.py", line 144, in separate
mix = self.prepare_mix(self.audio_file_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/markus/Projects/usdb_syncer/.venv/lib/python3.11/site-packages/audio_separator/separator/common_separator.py", line 176, in prepare_mix
mix, sr = librosa.load(mix, mono=False, sr=self.sample_rate)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/markus/Projects/usdb_syncer/.venv/lib/python3.11/site-packages/librosa/core/audio.py", line 183, in load
y, sr_native = __audioread_load(path, offset, duration, dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/markus/Projects/usdb_syncer/.venv/lib/python3.11/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/markus/Projects/usdb_syncer/.venv/lib/python3.11/site-packages/librosa/util/decorators.py", line 59, in __wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/markus/Projects/usdb_syncer/.venv/lib/python3.11/site-packages/librosa/core/audio.py", line 239, in __audioread_load
reader = audioread.audio_open(path)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/markus/Projects/usdb_syncer/.venv/lib/python3.11/site-packages/audioread/__init__.py", line 127, in audio_open
return BackendClass(path)
^^^^^^^^^^^^^^^^^^
File "/Users/markus/Projects/usdb_syncer/.venv/lib/python3.11/site-packages/audioread/macca.py", line 201, in __init__
url = CFURL(filename)
^^^^^^^^^^^^^^^
File "/Users/markus/Projects/usdb_syncer/.venv/lib/python3.11/site-packages/audioread/macca.py", line 141, in __init__
filename = filename.encode(sys.getfilesystemencoding())
^^^^^^^^^^^^^^^
AttributeError: 'PosixPath' object has no attribute 'encode'
2024-05-07 22:20:12 [ERROR] #28939: Failed to finish download due to an unexpected error. See debug log for more information.
Exception ignored in: <function CFObject.__del__ at 0x3abbdeac0>
Traceback (most recent call last):
File "/Users/markus/Projects/usdb_syncer/.venv/lib/python3.11/site-packages/audioread/macca.py", line 135, in __del__
_corefoundation.CFRelease(self._obj)
^^^^^^^^^
AttributeError: 'CFURL' object has no attribute '_obj'
Exception ignored in: <function ExtAudioFile.__del__ at 0x3abbdf4c0>
Traceback (most recent call last):
File "/Users/markus/Projects/usdb_syncer/.venv/lib/python3.11/site-packages/audioread/macca.py", line 336, in __del__
self.close()
File "/Users/markus/Projects/usdb_syncer/.venv/lib/python3.11/site-packages/audioread/macca.py", line 330, in close
if not self.closed:
^^^^^^^^^^^
AttributeError: 'ExtAudioFile' object has no attribute 'closed'
So judging from the Error opening ... format not recognized
I assume that it fails to open the existing file.
The output of ffprobe
for that file is:
ffprobe version 7.0 Copyright (c) 2007-2024 the FFmpeg developers
built with Apple clang version 15.0.0 (clang-1500.3.9.4)
configuration: --prefix=/usr/local/Cellar/ffmpeg/7.0_1 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libharfbuzz --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopenvino --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox
libavutil 59. 8.100 / 59. 8.100
libavcodec 61. 3.100 / 61. 3.100
libavformat 61. 1.100 / 61. 1.100
libavdevice 61. 1.100 / 61. 1.100
libavfilter 10. 1.100 / 10. 1.100
libswscale 8. 1.100 / 8. 1.100
libswresample 5. 1.100 / 5. 1.100
libpostproc 58. 1.100 / 58. 1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'Édith Piaf - La vie en rose.m4a':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : La vie en rose
artist : Édith Piaf
date : 1946
Duration: 00:03:07.34, start: 0.000000, bitrate: 129 kb/s
Stream #0:0[0x1](eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : ISO Media file produced by Google Inc.
vendor_id : [0][0][0][0]
m4a audio format seems to be supported as input, but not as output, contrary to what is stated in the readme ("output_format: (Optional) Format to encode output files, any common format (WAV, MP3, FLAC, M4A, etc.). Default: WAV").
When I try to output to m4a, I get
Unknown format: 'M4A'
(which stems from the soundfile module, which in turn relies on the libsndfile module). The libsndfile module already has an issue for the m4a support: https://github.com/libsndfile/libsndfile/issues/389.