huggingface / speechbox

Apache License 2.0
342 stars 33 forks source link

Add ASR + SD pipeline #9

Closed sanchit-gandhi closed 1 year ago

sanchit-gandhi commented 1 year ago

Adds pipeline for automatic speech recognition (ASR) + speaker diarization (SD)

Example:

import torch
from speechbox import ASRDiarizationPipeline
from datasets import load_dataset

device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipeline = ASRDiarizationPipeline.from_pretrained("openai/whisper-tiny", device=device)

# load dataset of concatenated LibriSpeech samples
concatenated_librispeech = load_dataset("sanchit-gandhi/concatenated_librispeech", split="train", streaming=True)
# get first sample
sample = next(iter(concatenated_librispeech))

out = pipeline(sample["audio"])
# format the transcriptions nicely for printout
print("\n\n".join([chunk["speaker"] + " " + str((round(chunk["timestamp"][0], 1), round(chunk["timestamp"][1], 1))) +  chunk["text"] for chunk in out]))

Print Output:

SPEAKER_01 (0.0, 15.0) Chapter 16 I might have told you of the beginning of this liaison in a few lines, but I wanted you to see every step by which we came. I to agree to whatever Mark Reid wished.

SPEAKER_00 (15.0, 22.0) He was in a fevered state of mind, owing to the blight his wife's action threatened to cast upon his entire future.