huggingface / speechbox

Apache License 2.0
342 stars 33 forks source link

Loading a custom audio sample into the diarization pipeline #18

Closed gabrilator closed 1 year ago

gabrilator commented 1 year ago

Hey! First of all, thanks for all the amazing work.

I am trying to get the diarization to work with custom audio samples (i.e audio.mp3 or audio.wav files), and I would like to know how to load them before calling the pipeline.

In particular, I'd like to substitute this sample with my own files:

concatenated_librispeech = load_dataset("sanchit-gandhi/concatenated_librispeech", split="train", streaming=True) sample = next(iter(concatenated_librispeech))

Sorry about my ignorance, I'm very used to NodeJS and finding it challenging to follow everything!

sanchit-gandhi commented 1 year ago

Hey @gabrilator! Thanks for opening this issue, awesome to have you hear!

That's a great question, since we use the Hugging Face ASR pipeline as our backend, we can simply pass the path to our audio file as the audio input:

import torch
from speechbox import ASRDiarizationPipeline
from datasets import load_dataset

device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipeline = ASRDiarizationPipeline.from_pretrained("openai/whisper-tiny", device=device)

path_to_audio = "path/to/audio/file"  # fill me!

out = pipeline(path_to_audio)
print(out)

See the ASR pipeline docs for more details 🤗

gabrilator commented 1 year ago

Thanks Sanchit! Keep up the amazing work, I'm mind-blown by you guys!!