jumon / whisper-punctuator

Zero-shot multimodal punctuation insertion and truecasing using Whisper
MIT License
94 stars 5 forks source link

Only seems to be working with model="small" #6

Closed aeciorc closed 1 year ago

aeciorc commented 1 year ago

Hello, thanks a lot for you work here.

I tested with a Spanish utterance, and when using any model besides "small", all I get in the output is the text input itself.

E.g:

from whisper_punctuator import Punctuator

punctuator = Punctuator(language="es", punctuations=",.?!\"",model_name="medium")

punctuated_text = punctuator.punctuate(
    "audio2.mp3",
    "entonces te delelitarás en jehová y yo te haré subir sobre las alturas de la tierra y te daré á comer la heredad de jacob tu padre porque la boca de jehová lo ha hablado"
)
print(punctuated_text)

returns: entonces te delelitarás en jehová y yo te haré subir sobre las alturas de la tierra y te daré á comer la heredad de jacob tu padre porque la boca de jehová lo ha hablado

Same happens with the large model. Meantime, with the small model, I get: Entonces te delelitarás en jehová, y yo te haré subir sobre las alturas de la tierra, y te daré, á comer la heredad de jacob tu padre, porque la boca de jehová lo ha hablado.

Any idea why?

jumon commented 1 year ago

Thank you for trying out!

If the Whisper model assigns a higher probability to unpunctuated text than punctuated text, it outputs unpunctuated text. The outputs depend on the size of the model, and sometimes the small model generates better results than the large model, as in your example.

One way to circumvent this behavior is to use a prompt, like the example below. The prompt "Hello, everyone." is fed to Whisper, which makes it more likely to insert punctuation marks in the style of the prompt. You can use any prompt you like, but the results will differ depending on the prompt used.

from whisper_punctuator import Punctuator

punctuator = Punctuator(language="en", punctuations=",.?", initial_prompt="Hello, everyone.")
punctuated_text = punctuator.punctuate(
    "tests/test.wav",
    "and do you know what the answer to this question now is the answer is no it is not possible to buy a cell phone that doesn't do too much so"
)
print(punctuated_text) # -> "And do you know what the answer to this question now is? The answer is no. It is not possible to buy a cell phone that doesn't do too much. So"
aeciorc commented 1 year ago

Thanks for the quick reply. I played with the prompt, but had no success unfortunately. At least with Spanish, only the small model worked

That being said, the small model with beam_size=5 worked so well, I don't think I even need the other models. If I have some time I'll dig into this, but either way thanks again for your work, it's saving me a ton of time

jumon commented 1 year ago

Thank you! Prompting can often be difficult to tune and it may not have worked for your data. If you require further assistance, please feel free to reopen this issue and ask me anything!