planning: Ichigo Transcription

dan-homebrew commented 1 month ago

Goal

Ichigo Demo should have transcription of the audio message
Likely driven by Eng
Data Storage is easier (i.e. can train over it)
Whisper Encoder is already in project (i.e. use Decoder)
Will not affect latency as this is post-processing

tikikun commented 3 weeks ago

@nguyenhoangthuan99 you can pick this up if you like, just extract embedding from encoder and forward it to whisper transcription

jrohsc commented 1 week ago

Hi, how can I do the transcription on a colab notebook? It seems like whenever I give a question audio, it only generates the answer to the question.

PodsAreAllYouNeed commented 1 day ago

Hi, how can I do the transcription on a colab notebook? It seems like whenever I give a question audio, it only generates the answer to the question.

I've prepared a colab demo with transcription example here: https://colab.research.google.com/drive/1req3ByqKS1vVPF_iGD1sNE2DzvMo7Jd0?usp=sharing

The relevant function for transcription is this:

def audio_to_text(audio_path, target_bandwidth=1.5, device=device):
    vq_model.ensure_whisper(device)
    wav, sr = torchaudio.load(audio_path)
    if sr != 16000:
        wav = torchaudio.functional.resample(wav, sr, 16000)
    with torch.no_grad():
        codes = vq_model.encode_audio(wav.to(device))
        transcript = vq_model.decode_text(codes[0]) 
    return f'{transcript[0].text}'

janhq / ichigo

planning: Ichigo Transcription #90

Goal