sanchit-gandhi / whisper-jax

JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
Apache License 2.0
4.4k stars 374 forks source link

Words timestamps [HELP] #37

Open RaulKite opened 1 year ago

RaulKite commented 1 year ago

I'm not able to get the transcription with words timestamps. Only sentences timestamps.

If this possible with whisper-jax?

Thanks

sanchit-gandhi commented 1 year ago

Hey @RaulKite! Not yet - since this is a fairly new Whisper feature, we first need to add this into Hugging Face Transformers, and then propagate the changes on to Whisper JAX. Hoping to have these as soon as possible

anindyagupta commented 1 year ago

First off kudos on this achievement- we use google speech and AWS both but this is just stellar performance!

  1. Quick question on the above - is there a timeline for the word time stamps? word time stamps are critical to play video with transcripts and all ASR systems provide it. We are considering using this in production replacing AWS but without word time stamps that might not be possible - if you can provide a timeline we would appreciate it a lot.
  2. And is there a way to detect uh, um in the speech - it will be highly beneficial for educational purposes.
sanchit-gandhi commented 1 year ago

Hey @anindyagupta - if anyone in the community would like to take a stab at adding word-level timestamps to 🤗 Transformers I'd be happy to guide the integration process and review PRs! Otherwise, I'm hoping to see to it by maybe next week. The full integration might take ~1.5-2 weeks?

This would be possible through prompting (an ongoing PR in 🤗 Transformers: https://github.com/huggingface/transformers/pull/22496). Again, once this is merged I'll propagate it on to Whisper JAX ASAP!

rairavi commented 1 year ago

Yes I am interested, anyway this works well https://github.com/linto-ai/whisper-timestamped

ferdavid1 commented 1 year ago

I currently use ^ whisper-timestamped and am looking to migrate to Whisper-JAX because of its fantastic speed. Really appreciate that you've got this on the docket for this repo so quickly. Looking forward to seeing this get integrated.

vvvm23 commented 1 year ago

Hi, has there been any update in this? It appears the above PR in :hugs: Transformers has been merged. It would be really useful in my own application for sub-second timestamps :)

AvivSham commented 1 year ago

@sanchit-gandhi is there any estimation for integrating both initial prompt and word timestamps?

gkarmas commented 10 months ago

Amazing project, super fast transcription, still missing this very important feature for word-by-word timestamps

crummenauerca commented 4 months ago

Is there any news about word level timestamps?

iampickle commented 3 months ago

Yes would also love to if it was integrated:)