resemble-ai / Resemblyzer

A python package to analyze and compare voices with deep learning
Apache License 2.0
2.79k stars 429 forks source link

Mp3 file is not accepted in demo02_diarization.py #11

Closed rajacsp closed 5 years ago

rajacsp commented 5 years ago

wav_fpath = Path("audio_data", "X2zqiX6yL3I.mp3")

When I try using mp3 files, it throws error. However, if I use wav formats, it goes through fine. Please correct me if I am doing anything wrong.

CorentinJ commented 5 years ago

Hmm, this again uh... As far as I know, that's because you haven't ffmpeg installed as backend.

nmstoker commented 5 years ago

I appear to be having the same error and have a solution below. @rajacsp didn't include sufficient error details to diagnose, so it's hard to be sure my error is exactly the same as his. I observe the same problem in demo02 and demo05 (which both rely on reading in mp3 files)

I'm running this on Arch Linux, with a conda environment running Python 3.6 with ffmpeg installed in the environment, plus all the other requirements.

I would get the following:

Preprocessing wavs: 0%| | 0/18 [00:00<?, ? utterances/s]Traceback (most recent call last): File "demo05_fake_speech_detection.py", line 25, in tqdm(wav_fpaths, "Preprocessing wavs", len(wav_fpaths), unit=" utterances")] File "demo05_fake_speech_detection.py", line 24, in wavs = [preprocess_wav(wav_fpath) for wav_fpath in \ File "/home/neil/Projects/Resemblyzer/resemblyzer/audio.py", line 27, in preprocess_wav wav, source_sr = librosa.load(fpath_or_wav, sr=None) File "/home/neil/.conda/envs/resemblyzer/lib/python3.6/site-packages/librosa/core/audio.py", line 149, in load six.reraise(*sys.exc_info()) File "/home/neil/.conda/envs/resemblyzer/lib/python3.6/site-packages/six.py", line 693, in reraise raise value File "/home/neil/.conda/envs/resemblyzer/lib/python3.6/site-packages/librosa/core/audio.py", line 129, in load with sf.SoundFile(path) as sf_desc: File "/home/neil/.conda/envs/resemblyzer/lib/python3.6/site-packages/soundfile.py", line 627, in init self._file = self._open(file, mode_int, closefd) File "/home/neil/.conda/envs/resemblyzer/lib/python3.6/site-packages/soundfile.py", line 1182, in _open "Error opening {0!r}: ".format(self.name)) File "/home/neil/.conda/envs/resemblyzer/lib/python3.6/site-packages/soundfile.py", line 1355, in _error_check raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace')) RuntimeError: Error opening 'audio_data/donald_trump/fake/J-SwzTNeN4M.mp3': File contains data in an unknown format.

Unlike the earlier mp3 issue raised, I had no mention of the missing backend (and I confirmed ffmpeg was accessible via Python).

From what I can see the issue is that the code currently passes Path objects to preprocess_wav and when that calls librosa, librosa's load function only accepts strings for cases where it needs to fall back to audioread. See librosa/core/audio.py:

def load(path, sr=22050, mono=True, offset=0.0, duration=None,
         dtype=np.float32, res_type='kaiser_best'):
    """Load an audio file as a floating point time series.

    Audio will be automatically resampled to the given rate
    (default `sr=22050`).

    To preserve the native sampling rate of the file, use `sr=None`.

    Parameters
    ----------
    path : string, int, or file-like object
        path to the input file.

        Any codec supported by `soundfile` or `audioread` will work.

        If the codec is supported by `soundfile`, then `path` can also be
        an open file descriptor (int), or any object implementing Python's
        file interface.

        If the codec is not supported by `soundfile` (e.g., MP3), then only
        string file paths are supported.

The solution is simply to change the calls to preprocess_wav to convert the paths to strings:

wavs = [preprocess_wav(wav_fpath) for wav_fpath in \
         tqdm(wav_fpaths, "Preprocessing wavs", len(wav_fpaths), unit=" utterances")]

becomes:

wavs = [preprocess_wav(str(wav_fpath)) for wav_fpath in \
         tqdm(wav_fpaths, "Preprocessing wavs", len(wav_fpaths), unit=" utterances")]

I had nearly given up and was going to "cheat" by converting the mp3s to wav format manually, as ffmpeg was installed and working fine, but this fix looks easy to apply.

nmstoker commented 5 years ago

Also when falling back to audioread, librosa gives annoying user warnings (which mess up the progress bar)

They can be suppressed by including this near the top of the demo scripts (02 and 05):

import warnings
warnings.simplefilter("ignore")

And finally I should've said this earlier: @CorentinJ - a big thank you for this repo - it's really cool! :slightly_smiling_face:

CorentinJ commented 5 years ago

Thank you. What gives to converting the paths to strings then, e.g.:

wav, source_sr = librosa.load(str(fpath_or_wav), sr=None)

https://github.com/resemble-ai/Resemblyzer/blob/master/resemblyzer/audio.py#L27

nmstoker commented 5 years ago

Yes, that would work and avoids the risk of someone not converting a Path to a string when using MP3.

CorentinJ commented 5 years ago

Done in f60e74c3e29b35c8e1eb56a1ea8221ce90f461c3, reopen if the issue persists.