pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
6.24k stars 773 forks source link

AttributeError: 'Annotation' object has no attribute 'for_json' #1668

Closed alvynabranches closed 7 months ago

alvynabranches commented 7 months ago

Tested versions

Library Version
Python 3.12.2
Pyannote.audio 3.1.1
Pyannote.core 5.0.0

System information

macOS 14.1 (23B2073) - M3 Max

Issue description

Code

from transformers import pipeline
from pyannote.audio import Pipeline
from speechbox import ASRDiarizationPipeline as ASRDP

diarization_pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", use_auth_token=os.environ["HUGGINGFACE_TOKEN"])
asr_pipeline = pipeline("automatic-speech-recognition", model="openai/whisper-large-v3")
pipe = ASRDP(asr_pipeline=asr_pipeline, diarization_pipeline=diarization_pipeline)
output = pipe("audio.mp3")

Error

[/opt/homebrew/lib/python3.12/site-packages/tqdm/auto.py:21](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/tqdm/auto.py:21): TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
[/opt/homebrew/lib/python3.12/site-packages/pyannote/audio/core/io.py:43](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/pyannote/audio/core/io.py:43): UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[8], [line 2](vscode-notebook-cell:?execution_count=8&line=2)
      [1](vscode-notebook-cell:?execution_count=8&line=1) with ProgressHook() as hook:
----> [2](vscode-notebook-cell:?execution_count=8&line=2)     output = pipe("audio.mp3", hook=hook)

File [/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:90](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:90), in ASRDiarizationPipeline.__call__(self, inputs, group_by_speaker, **kwargs)
     [83](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:83) inputs, diarizer_inputs = self.preprocess(inputs)
     [85](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:85) diarization = self.diarization_pipeline(
     [86](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:86)     {"waveform": diarizer_inputs, "sample_rate": self.sampling_rate},
     [87](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:87)     **kwargs,
     [88](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:88) )
---> [90](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:90) segments = diarization.for_json()["content"]
     [92](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:92) # diarizer output may contain consecutive segments from the same speaker (e.g. {(0 -> 1, speaker_1), (1 -> 1.5, speaker_1), ...})
     [93](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:93) # we combine these segments to give overall timestamps for each speaker's turn (e.g. {(0 -> 1.5, speaker_1), ...})
     [94](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:94) new_segments = []

AttributeError: 'Annotation' object has no attribute 'for_json'

This was tried on Jupyter Notebook on local device as well as on Google Collab. The error remains the same.

Minimal reproduction example (MRE)

AttributeError: 'Annotation' object has no attribute 'for_json'

hbredin commented 7 months ago

Looks like a problem with speechbox, not ˋpyannote.audio`.

alvynabranches commented 7 months ago

What is the fix?

hbredin commented 7 months ago

There is indeed no function for_json in pyannote.core. The code calling for_json is in speechbox (line 90 of file speechbox/diarize.py) , not pyannote. Closing.