got an unexpected keyword argument 'use_auth_token'

shihab-sol commented 8 months ago

TypeError Traceback (most recent call last) in <cell line: 6>() 4 5 device = "cuda:0" if torch.cuda.is_available() else "cpu" ----> 6 pipeline = ASRDiarizationPipeline.from_pretrained("openai/whisper-tiny", device=device,token='') 7 8 # load dataset of concatenated LibriSpeech samples

2 frames /usr/local/lib/python3.10/dist-packages/transformers/pipelines/automatic_speech_recognition.py in init(self, model, feature_extractor, tokenizer, decoder, modelcard, framework, task, args_parser, device, torch_dtype, binary_output, kwargs) 286 self.type = "ctc" 287 --> 288 self._preprocess_params, self._forward_params, self._postprocess_params = self._sanitize_parameters(kwargs) 289 290 mapping = MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES.copy()

TypeError: AutomaticSpeechRecognitionPipeline._sanitize_parameters() got an unexpected keyword argument 'use_auth_token'

splevine commented 8 months ago

I'm having the same issue. I'm in colab and I put in the auth token multiple ways.

KennethTrinh commented 8 months ago

hmm it appears the github repo is not synced with PyPI yet - transformers pipeline renamed it to token, so fixed accordingly in this repo: https://github.com/huggingface/speechbox/blob/db362fd99d9528c29725e035c177370476ba55d7/src/speechbox/diarize.py#L37

The reason is that use_auth_token implies bool but also accepts a string - so token is clearer.

Here's my temp fix (i installed it in edit mode just in case but feel free to do whatever u want): pip uninstall speechbox && git clone https://github.com/huggingface/speechbox.git && cd speechbox && pip install -e .

Also please delete your api token and try not to paste it in a public forum in the future :)

jgstew commented 7 months ago

I got this working with: pip uninstall speechbox && pip install git+https://github.com/huggingface/speechbox.git

alvynabranches commented 4 months ago

Code

pipe = ASRDiarizationPipeline.from_pretrained(asr_model="openai/whisper-large-v3", diarizer_model="pyannote/speaker-diarization-3.1")

Error TypeError: AutomaticSpeechRecognitionPipeline._sanitize_parameters() got an unexpected keyword argument 'use_auth_token'

alvynabranches commented 4 months ago

Tested versions

Library	Version
Python	3.12.2
Pyannote.audio	3.1.1
Pyannote.core	5.0.0

System information

macOS 14.1 (23B2073) - M3 Max

Issue description

Code

from transformers import pipeline
from pyannote.audio import Pipeline
from speechbox import ASRDiarizationPipeline as ASRDP

diarization_pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", use_auth_token=os.environ["HUGGINGFACE_TOKEN"])
asr_pipeline = pipeline("automatic-speech-recognition", model="openai/whisper-large-v3")
pipe = ASRDP(asr_pipeline=asr_pipeline, diarization_pipeline=diarization_pipeline)
output = pipe("audio.mp3")

Error

[/opt/homebrew/lib/python3.12/site-packages/tqdm/auto.py:21](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/tqdm/auto.py:21): TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
[/opt/homebrew/lib/python3.12/site-packages/pyannote/audio/core/io.py:43](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/pyannote/audio/core/io.py:43): UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[8], [line 2](vscode-notebook-cell:?execution_count=8&line=2)
      [1](vscode-notebook-cell:?execution_count=8&line=1) with ProgressHook() as hook:
----> [2](vscode-notebook-cell:?execution_count=8&line=2)     output = pipe("audio.mp3", hook=hook)

File [/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:90](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:90), in ASRDiarizationPipeline.__call__(self, inputs, group_by_speaker, **kwargs)
     [83](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:83) inputs, diarizer_inputs = self.preprocess(inputs)
     [85](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:85) diarization = self.diarization_pipeline(
     [86](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:86)     {"waveform": diarizer_inputs, "sample_rate": self.sampling_rate},
     [87](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:87)     **kwargs,
     [88](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:88) )
---> [90](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:90) segments = diarization.for_json()["content"]
     [92](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:92) # diarizer output may contain consecutive segments from the same speaker (e.g. {(0 -> 1, speaker_1), (1 -> 1.5, speaker_1), ...})
     [93](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:93) # we combine these segments to give overall timestamps for each speaker's turn (e.g. {(0 -> 1.5, speaker_1), ...})
     [94](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.12/site-packages/speechbox/diarize.py:94) new_segments = []

AttributeError: 'Annotation' object has no attribute 'for_json'

This was tried on Jupyter Notebook on local device as well as on Google Collab. The error remains the same.

Minimal reproduction example (MRE)

AttributeError: 'Annotation' object has no attribute 'for_json'

huggingface / speechbox