jumon / whisper-punctuator

Zero-shot multimodal punctuation insertion and truecasing using Whisper
MIT License
94 stars 5 forks source link

Question on the usage #2

Closed zhhuang93 closed 1 year ago

zhhuang93 commented 1 year ago

Hi, does this code first transcribe the given audio and then add punctuation to the transcription, or just add punctuation based on the given text? I saw you give the audio file as well as its text in the example.

jumon commented 1 year ago

Thank you for checking my code! This is a multi-modal punctuation insertion system that inserts punctuation marks given unpunctuated text and corresponding audio. One primary use case for this code is to add punctuation marks to transcriptions of audio corpora like Librispeech, which do not include punctuation marks.

zhhuang93 commented 1 year ago

Thanks for your reply! So for the given example: punctuated_text = punctuator.punctuate( "tests/test.wav", "and do you know what the answer to this question now is the answer is no it is not possible to buy a cell phone that doesn't do too much so")", it would firstly transcribe the audio "tests/test.wav" into text by whisper models, then insert punctuation marks. Or insert punctuation marks to the given text "and do you know what the answer to this question now is the answer is no it is not possible to buy a cell phone that doesn't do too much so". I am a little confused which one is the right step.

jumon commented 1 year ago

The latter statement is correct. This code uses Whisper as a punctuation insertion model, not as a speech recognition model.