Use whisperx and pyannote in Colab without HuggingFace token

Hello! I would like to use WhisperX and Pyannote to combine automatic transcription and diarization. I can do it on Colab using the Huggingface (HF) token, but I would like to avoid entering the HF token every time. So I was thinking of downloading them locally and loading them when needed. I can do this for WhisperX but not for Pyannote. Therefore, I can do the automatic transcription but cannot proceed with the annotation. I followed these tutorials on how to use Pyannote and the offline pipelines and downloaded everything, but it still doesn't work. Can you help me?

I download pytorch_model.bin and configuration.yaml for voice_activity_detection, then yaml files for segmentation and speaker_diarization and put them in working directory. Then I used this code:

!pip install whisperx
import whisperx
import gc

device = "cuda"
batch_size = 4 # reduce if low on GPU mem
compute_type = "float16" # change to "int8" if low on GPU mem (may reduce accuracy)

audio_file = "audio.wav"
audio = whisperx.load_audio(audio_file)
device = "cuda"
compute_type = "float16" # change to "int8" if low on GPU mem (may reduce accuracy)
batch_size= 16 # reduce if low on GPU mem
model_name = "large-v2"

model_dir = "/content/drive/MyDrive/whisperx_models/large-v2"
model = whisperx.load_model(model_name, device, compute_type=compute_type, download_root=model_dir)

audio = whisperx.load_audio(audio_file)
result = model.transcribe(audio, batch_size=batch_size)
print(result["segments"]) # before alignment

model_a, metadata = whisperx.load_align_model(language_code=result["language"], device=device)
result = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False)

print(result["segments"]) # after alignment

!pip install pyannote.audio

pipeline = Pipeline.from_pretrained("/content/drive/MyDrive/speaker_diarization.yaml")

# Apply diarization
diarization_result = pipeline(audio_file)

m-bain / whisperX

Use whisperx and pyannote in Colab without HuggingFace token #841