janhq / ichigo

Local realtime voice AI
Apache License 2.0
2.02k stars 101 forks source link

planning: Ichigo Transcription #90

Open dan-homebrew opened 1 month ago

dan-homebrew commented 1 month ago



tikikun commented 3 weeks ago

@nguyenhoangthuan99 you can pick this up if you like, just extract embedding from encoder and forward it to whisper transcription

jrohsc commented 1 week ago

Hi, how can I do the transcription on a colab notebook? It seems like whenever I give a question audio, it only generates the answer to the question.

PodsAreAllYouNeed commented 1 day ago

Hi, how can I do the transcription on a colab notebook? It seems like whenever I give a question audio, it only generates the answer to the question.

I've prepared a colab demo with transcription example here: https://colab.research.google.com/drive/1req3ByqKS1vVPF_iGD1sNE2DzvMo7Jd0?usp=sharing

The relevant function for transcription is this:

def audio_to_text(audio_path, target_bandwidth=1.5, device=device):
    wav, sr = torchaudio.load(audio_path)
    if sr != 16000:
        wav = torchaudio.functional.resample(wav, sr, 16000)
    with torch.no_grad():
        codes = vq_model.encode_audio(wav.to(device))
        transcript = vq_model.decode_text(codes[0]) 
    return f'{transcript[0].text}'