resemble-ai / Resemblyzer

A python package to analyze and compare voices with deep learning
Apache License 2.0
2.66k stars 419 forks source link

Error while trying to plot speaker similarity in Resemblyzer #74

Closed sankalpbhatia20 closed 1 year ago

sankalpbhatia20 commented 1 year ago

Hey developers! I am new to Resemblyzer and it seems quite interesting.

I am using Resemblyzer to get the similarity chart for the speakers in an MP3 file that I have but I am getting an error. I even tried using the exact MP3 file given in the repository but I still got the same error.

The code:

from resemblyzer import preprocess_wav, VoiceEncoder
from demo_utils import *
from pathlib import Path

wav_fpath = Path("/Users/sankalpbhatia/Dropbox/Mac/Desktop/voice_similarity/meta_full_qa.mp3")
wav = preprocess_wav(wav_fpath)

# Cut some segments from single speakers as reference audio
segments = [[5, 10], [52, 59], [274, 284]]
speaker_names = ["David (Dave) Wehrner (CFO)", "Sheryl Sandberg (COO)", "Mark Zuckerberg (CEO)"]
speaker_wavs = [wav[int(s[0] * sampling_rate):int(s[1] * sampling_rate)] for s in segments]

encoder = VoiceEncoder("cpu")
print("Running the continuous embedding on cpu, this might take a while...")
_, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16)

speaker_embeds = [encoder.embed_utterance(speaker_wav) for speaker_wav in speaker_wavs]
similarity_dict = {name: cont_embeds @ speaker_embed for name, speaker_embed in 
                   zip(speaker_names, speaker_embeds)}

interactive_diarization(similarity_dict, wav, wav_splits)

The error log:

  return f(*args, **kwargs)
Traceback (most recent call last):
  File "/Users/sankalpbhatia/opt/anaconda3/lib/python3.9/site-packages/librosa/core/audio.py", line 164, in load
    y, sr_native = __soundfile_load(path, offset, duration, dtype)
  File "/Users/sankalpbhatia/opt/anaconda3/lib/python3.9/site-packages/librosa/core/audio.py", line 195, in __soundfile_load
    context = sf.SoundFile(path)
  File "/Users/sankalpbhatia/opt/anaconda3/lib/python3.9/site-packages/soundfile.py", line 629, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "/Users/sankalpbhatia/opt/anaconda3/lib/python3.9/site-packages/soundfile.py", line 1183, in _open
    _error_check(_snd.sf_error(file_ptr),
  File "/Users/sankalpbhatia/opt/anaconda3/lib/python3.9/site-packages/soundfile.py", line 1357, in _error_check
    raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening '/Users/sankalpbhatia/Dropbox/Mac/Desktop/voice_similarity/meta_full_qa.mp3': File contains data in an unknown format.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/sankalpbhatia/Dropbox/Mac/Desktop/voice_similarity/test_two.py", line 14, in <module>
    wav = preprocess_wav(wav_fpath)
  File "/Users/sankalpbhatia/opt/anaconda3/lib/python3.9/site-packages/resemblyzer/audio.py", line 27, in preprocess_wav
    wav, source_sr = librosa.load(fpath_or_wav, sr=None)
  File "/Users/sankalpbhatia/opt/anaconda3/lib/python3.9/site-packages/librosa/util/decorators.py", line 88, in inner_f
    return f(*args, **kwargs)
  File "/Users/sankalpbhatia/opt/anaconda3/lib/python3.9/site-packages/librosa/core/audio.py", line 170, in load
    y, sr_native = __audioread_load(path, offset, duration, dtype)
  File "/Users/sankalpbhatia/opt/anaconda3/lib/python3.9/site-packages/librosa/core/audio.py", line 226, in __audioread_load
    reader = audioread.audio_open(path)
  File "/Users/sankalpbhatia/opt/anaconda3/lib/python3.9/site-packages/audioread/__init__.py", line 111, in audio_open
    return BackendClass(path)
  File "/Users/sankalpbhatia/opt/anaconda3/lib/python3.9/site-packages/audioread/macca.py", line 200, in __init__
    url = CFURL(filename)
  File "/Users/sankalpbhatia/opt/anaconda3/lib/python3.9/site-packages/audioread/macca.py", line 140, in __init__
    filename = filename.encode(sys.getfilesystemencoding())
AttributeError: 'PosixPath' object has no attribute 'encode'
Exception ignored in: <function CFObject.__del__ at 0x7feee048e3a0>
Traceback (most recent call last):
  File "/Users/sankalpbhatia/opt/anaconda3/lib/python3.9/site-packages/audioread/macca.py", line 134, in __del__
    _corefoundation.CFRelease(self._obj)
AttributeError: 'CFURL' object has no attribute '_obj'
Exception ignored in: <function ExtAudioFile.__del__ at 0x7feee048eb80>
Traceback (most recent call last):
  File "/Users/sankalpbhatia/opt/anaconda3/lib/python3.9/site-packages/audioread/macca.py", line 335, in __del__
    self.close()
  File "/Users/sankalpbhatia/opt/anaconda3/lib/python3.9/site-packages/audioread/macca.py", line 329, in close
    if not self.closed:
AttributeError: 'ExtAudioFile' object has no attribute 'closed'
(base) Sankalps-MacBook-Air:voice_similarity sankalpbhatia$

Your help will be appreciated! Thank you.