Possibly corrupted files in fma_small

mdeff / fma

FMA: A Dataset For Music Analysis

https://arxiv.org/abs/1612.01840

MIT License

2.2k stars 432 forks source link

Possibly corrupted files in fma_small #36

Closed Samaritan1011001 closed 4 years ago

Samaritan1011001 commented 4 years ago

I apologize if I missed a step or did not do something on my part. Thank you for the data and all the examples.

The training using cnn after pre-processing the audio files starts off but as soon as some files are fetched, the training stops with the below error: Unknown: CalledProcessError: Command '['ffmpeg', '-i', 'path-to-dataset\\fma_small\\099\\**099134**.mp3', '-f', 's16le', '-acodec', 'pcm_s16le', '-ac', '1', '-']' return ed non-zero exit status 1.

Looking at this, I checked the file 099134 and my default audio player could not play it and also the metadata(in File explorer) for that file seems to be missing as shown below

Samaritan1011001 commented 4 years ago

Librosa throws error File contains data in an unknown format when trying to load that file but works well with other files. Similarly other files like 133297 also behave this way.

Here is a gist to reproduce it

import utils
import librosa
import os
import IPython.display as ipd

AUDIO_DIR = os.environ.get('AUDIO_DIR')

filename = utils.get_audio_path(AUDIO_DIR, 99134)
print('File: {}'.format(filename))

x, sr = librosa.load(filename, sr=None, mono=True)
print('Duration: {:.2f}s, {} samples'.format(x.shape[-1] / sr, x.size))

start, end = 7, 17
ipd.Audio(data=x[start*sr:end*sr], rate=sr)

andimarafioti commented 4 years ago

This has already been mentioned here https://github.com/mdeff/fma/issues/8 aka, you're not alone

Samaritan1011001 commented 4 years ago

Oh, yeah right! Thanks. I forgot to note down the filenames that I found are corrupted so this issue seems to be not much use, so I'll close it for now.