Open dan-homebrew opened 1 month ago
@nguyenhoangthuan99 you can pick this up if you like, just extract embedding from encoder and forward it to whisper transcription
Hi, how can I do the transcription on a colab notebook? It seems like whenever I give a question audio, it only generates the answer to the question.
Hi, how can I do the transcription on a colab notebook? It seems like whenever I give a question audio, it only generates the answer to the question.
I've prepared a colab demo with transcription example here: https://colab.research.google.com/drive/1req3ByqKS1vVPF_iGD1sNE2DzvMo7Jd0?usp=sharing
The relevant function for transcription is this:
def audio_to_text(audio_path, target_bandwidth=1.5, device=device):
vq_model.ensure_whisper(device)
wav, sr = torchaudio.load(audio_path)
if sr != 16000:
wav = torchaudio.functional.resample(wav, sr, 16000)
with torch.no_grad():
codes = vq_model.encode_audio(wav.to(device))
transcript = vq_model.decode_text(codes[0])
return f'{transcript[0].text}'
Goal