Long running transcription using webgpu-whisper

Question

Noob question - the webgpu-whisper demo does real time transcription, however it doesn't build out a full transcript from the start ie. 2 mins into transcription, the first few transcribed lines disappear.

Transcript at time x 👇

Cool, let's test this out. We'll see how this works. So turns out that the transcription when I try to access it is actually just empty. And so the only thing that actually comes through is. So yeah, so the output that's getting cut is basically coming from the

Transcript at time x+1 👇

this out, we'll see how this works. So turns out that the transcription when I try to access it is actually just empty. And so the only thing that actually comes through is. So yeah, so the output that's getting cut is basically coming from the work

Note how the "Cool, let's test" is missing from the start of the second transcript.

I'm wondering what it would take to keep building the transcript for a long running meeting without losing any of the previously transcribed stuff?

I tried a naive appending approach and that just results in a transcript full of repetition.

So I'm very curious about what it would take to build out a streaming transcription similar to what something like Deepgram would offer. Would that require a change to the pipeline? Are there models that can take an appended transcript with lots of repetition and trim it down to a clean transcript?

Please let me know if my questions are unclear. Just looking for some direction so that I can potentially put up a PR for this (if needed).

xenova / transformers.js

Long running transcription using webgpu-whisper #802

Question