Closed aeciorc closed 1 year ago
Thank you for trying out!
If the Whisper model assigns a higher probability to unpunctuated text than punctuated text, it outputs unpunctuated text. The outputs depend on the size of the model, and sometimes the small model generates better results than the large model, as in your example.
One way to circumvent this behavior is to use a prompt, like the example below. The prompt "Hello, everyone." is fed to Whisper, which makes it more likely to insert punctuation marks in the style of the prompt. You can use any prompt you like, but the results will differ depending on the prompt used.
from whisper_punctuator import Punctuator
punctuator = Punctuator(language="en", punctuations=",.?", initial_prompt="Hello, everyone.")
punctuated_text = punctuator.punctuate(
"tests/test.wav",
"and do you know what the answer to this question now is the answer is no it is not possible to buy a cell phone that doesn't do too much so"
)
print(punctuated_text) # -> "And do you know what the answer to this question now is? The answer is no. It is not possible to buy a cell phone that doesn't do too much. So"
Thanks for the quick reply. I played with the prompt, but had no success unfortunately. At least with Spanish, only the small model worked
That being said, the small model with beam_size=5 worked so well, I don't think I even need the other models. If I have some time I'll dig into this, but either way thanks again for your work, it's saving me a ton of time
Thank you! Prompting can often be difficult to tune and it may not have worked for your data. If you require further assistance, please feel free to reopen this issue and ask me anything!
Hello, thanks a lot for you work here.
I tested with a Spanish utterance, and when using any model besides "small", all I get in the output is the text input itself.
E.g:
returns: entonces te delelitarás en jehová y yo te haré subir sobre las alturas de la tierra y te daré á comer la heredad de jacob tu padre porque la boca de jehová lo ha hablado
Same happens with the large model. Meantime, with the small model, I get: Entonces te delelitarás en jehová, y yo te haré subir sobre las alturas de la tierra, y te daré, á comer la heredad de jacob tu padre, porque la boca de jehová lo ha hablado.
Any idea why?