Open shihab-sol opened 8 months ago
I'm having the same issue. I'm in colab and I put in the auth token multiple ways.
hmm it appears the github repo is not synced with PyPI yet - transformers pipeline renamed it to token, so fixed accordingly in this repo: https://github.com/huggingface/speechbox/blob/db362fd99d9528c29725e035c177370476ba55d7/src/speechbox/diarize.py#L37
The reason is that use_auth_token
implies bool but also accepts a string - so token
is clearer.
Here's my temp fix (i installed it in edit mode just in case but feel free to do whatever u want):
pip uninstall speechbox && git clone https://github.com/huggingface/speechbox.git && cd speechbox && pip install -e .
Also please delete your api token and try not to paste it in a public forum in the future :)
I got this working with: pip uninstall speechbox && pip install git+https://github.com/huggingface/speechbox.git
Code
pipe = ASRDiarizationPipeline.from_pretrained(asr_model="openai/whisper-large-v3", diarizer_model="pyannote/speaker-diarization-3.1")
Error TypeError: AutomaticSpeechRecognitionPipeline._sanitize_parameters() got an unexpected keyword argument 'use_auth_token'
Library | Version |
---|---|
Python | 3.12.2 |
Pyannote.audio | 3.1.1 |
Pyannote.core | 5.0.0 |
macOS 14.1 (23B2073) - M3 Max
Code
from transformers import pipeline
from pyannote.audio import Pipeline
from speechbox import ASRDiarizationPipeline as ASRDP
diarization_pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", use_auth_token=os.environ["HUGGINGFACE_TOKEN"])
asr_pipeline = pipeline("automatic-speech-recognition", model="openai/whisper-large-v3")
pipe = ASRDP(asr_pipeline=asr_pipeline, diarization_pipeline=diarization_pipeline)
output = pipe("audio.mp3")
Error
[/opt/homebrew/lib/python3.12/site-packages/tqdm/auto.py:21](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/tqdm/auto.py:21): TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
[/opt/homebrew/lib/python3.12/site-packages/pyannote/audio/core/io.py:43](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/pyannote/audio/core/io.py:43): UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
torchaudio.set_audio_backend("soundfile")
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[8], [line 2](vscode-notebook-cell:?execution_count=8&line=2)
[1](vscode-notebook-cell:?execution_count=8&line=1) with ProgressHook() as hook:
----> [2](vscode-notebook-cell:?execution_count=8&line=2) output = pipe("audio.mp3", hook=hook)
File [/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:90](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:90), in ASRDiarizationPipeline.__call__(self, inputs, group_by_speaker, **kwargs)
[83](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:83) inputs, diarizer_inputs = self.preprocess(inputs)
[85](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:85) diarization = self.diarization_pipeline(
[86](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:86) {"waveform": diarizer_inputs, "sample_rate": self.sampling_rate},
[87](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:87) **kwargs,
[88](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:88) )
---> [90](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:90) segments = diarization.for_json()["content"]
[92](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:92) # diarizer output may contain consecutive segments from the same speaker (e.g. {(0 -> 1, speaker_1), (1 -> 1.5, speaker_1), ...})
[93](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:93) # we combine these segments to give overall timestamps for each speaker's turn (e.g. {(0 -> 1.5, speaker_1), ...})
[94](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:94) new_segments = []
AttributeError: 'Annotation' object has no attribute 'for_json'
This was tried on Jupyter Notebook on local device as well as on Google Collab. The error remains the same.
AttributeError: 'Annotation' object has no attribute 'for_json'
TypeError Traceback (most recent call last) in <cell line: 6>()
4
5 device = "cuda:0" if torch.cuda.is_available() else "cpu"
----> 6 pipeline = ASRDiarizationPipeline.from_pretrained("openai/whisper-tiny", device=device,token='')
7
8 # load dataset of concatenated LibriSpeech samples
2 frames /usr/local/lib/python3.10/dist-packages/transformers/pipelines/automatic_speech_recognition.py in init(self, model, feature_extractor, tokenizer, decoder, modelcard, framework, task, args_parser, device, torch_dtype, binary_output, kwargs) 286 self.type = "ctc" 287 --> 288 self._preprocess_params, self._forward_params, self._postprocess_params = self._sanitize_parameters(kwargs) 289 290 mapping = MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES.copy()
TypeError: AutomaticSpeechRecognitionPipeline._sanitize_parameters() got an unexpected keyword argument 'use_auth_token'