Closed keunwoochoi closed 3 years ago
Thanks for reporting. I've checked some (002/002624.mp3, 084/084522.mp3, 101/101265.mp3, 140/140449.mp3, 148/148795.mp3
) and could open them with librosa (v0.8.0) and ffmpeg (v4.3.1), and listen to them with mpv (v0.32.0), also on Linux. What did you try exactly?
Do also check that your local copy isn't corrupted. You can get a checksum of an audio file as sha1sum 002/002624.mp3
then check that it corresponds to what is recorded in the checksums
file. Or check them all with sha1sum -c checksums
.
Also, when I created the dataset, I did extract features (with librosa) from all tracks in fma_full
.
Hm, this is interesting. I can't really figure it out and ended up ignoring those files. Anyway..
$ sha1sum 002/002624.mp3
5e421474f0cbcf35648753fe1fd3cc22788d1bbe 002/002624.mp3
fma_large $ grep 002/002624.mp3 checksums
5e421474f0cbcf35648753fe1fd3cc22788d1bbe 002/002624.mp3
So the file is correct.
$ ffmpeg
ffmpeg version 4.1.6-1~deb10u1 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 8 (Debian 8.3.0-6)
configuration: --prefix=/usr --extra-version='1~deb10u1' --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy--enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 22.100 / 56. 22.100
libavcodec 58. 35.100 / 58. 35.100
libavformat 58. 20.100 / 58. 20.100
libavdevice 58. 5.100 / 58. 5.100
libavfilter 7. 40.101 / 7. 40.101
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 3.100 / 5. 3.100
libswresample 3. 3.100 / 3. 3.100
libpostproc 55. 3.100 / 55. 3.100
Hyper fast Audio and Video encoder
usage: ffmpeg [options] [[infile options] -i infile]... {[outfile options] outfile}...
Use -h to get full help or, even better, run 'man ffmpeg'
ffmpeg is installed
$ ls -l 002/002624.mp3
-r--r--r-- 1 <REDACTED> 1563 Apr 1 2017 002/002624.mp3
1563 Byte seems very small...
And it's the error in Python.
$ python3
Python 3.7.3 (default, Apr 3 2019, 05:39:12)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import librosa
>>> _ = librosa.load('002/002624.mp3')
[REDACTED]/python3.7/site-packages/librosa/core/audio.py:162: UserWarning: PySoundFile failed. Trying audioread instead.
warnings.warn("PySoundFile failed. Trying audioread instead.")
Traceback (most recent call last):
File "[REDACTED]/python3.7/site-packages/librosa/core/audio.py", line 146, in load
with sf.SoundFile(path) as sf_desc:
File "[REDACTED]/python3.7/site-packages/soundfile.py", line 629, in __init__
self._file = self._open(file, mode_int, closefd)
File "[REDACTED]/python3.7/site-packages/soundfile.py", line 1184, in _open
"Error opening {0!r}: ".format(self.name))
File "[REDACTED]/python3.7/site-packages/soundfile.py", line 1357, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening '002/002624.mp3': File contains data in an unknown format.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "[REDACTED]/python3.7/site-packages/librosa/core/audio.py", line 163, in load
y, sr_native = __audioread_load(path, offset, duration, dtype)
File "[REDACTED]/python3.7/site-packages/librosa/core/audio.py", line 187, in __audioread_load
with audioread.audio_open(path) as input_file:
File "[REDACTED]/python3.7/site-packages/audioread/__init__.py", line 116, in audio_open
raise NoBackendError()
audioread.exceptions.NoBackendError
FYI, I have libsndfile1
installed in the machine.
My bad, I was checking fma_full
instead of fma_large
... I can reproduce. It's a known issue, but we didn't have a list for fma_large
yet. I've added yours. Thanks!
I think the list is however incomplete, as it should be a superset of the fma_small
and fma_medium
lists. For example, I get the same issue with 099/099134.mp3
which is not in your list. Don't you?
no problem! and you’re right, I only included the files that only exist in FMA Large. I assumed all the files in FMA small and medium are in FMA large so in my code, I ignore all the corrupted files in FMA small, medium, and large.
I see, so we now have complete lists for the three subsets. Thanks!
Does the list also contains tracks that are shorter than 30s but load fine? Or don't you ignore those?
It probably doesn't contain those files. The list contains files that I had error when trying to load the audio file.
Ok, thanks for confirming.
I didn't double check, but I couldn't open files with these indices on linux/ffmpeg/librosa. Just wanted to share so that others would get some hints.