linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence
GNU Affero General Public License v3.0
1.85k stars 149 forks source link

Support for Batch of Audio Files #200

Open shahrukhx01 opened 1 month ago

shahrukhx01 commented 1 month ago

Thanks for this great project! I went over the codebase found it very insightful. Do you plan to incorporate batch support anytime soon in the future?

Jeronymous commented 1 month ago

This would indeed be a super useful feature, but we are not planning to implement it here soon.

shahrukhx01 commented 1 month ago

Hi @Jeronymous, would you be open to a contribution? I think adding batching to naive method would be quite straight forward and I'd be happy to contribute there :)

Jeronymous commented 1 month ago

If you see a straightforward way to address batching, of course feel free to try it (implement it in a fork and open a Pull Request) Contributions are always welcome 😄

MSLDCherryPick commented 1 month ago

@shahrukhx01 Great idea! Cannot wait for your the batch inference version! Is there a plan to release the code?

shahrukhx01 commented 1 month ago

@shahrukhx01 Great idea! Cannot wait for your the batch inference version! Is there a plan to release the code?

Hi @MSLDCherryPick , We are currently working on batched version of whisper-timestamped at my workplace and the progress so far has been promising. We plan to contribute back to whisper-timestamped sometime in the second half of September, once we have the stable version of batching working for us. Stay tuned!