huggingface / speechbox

Apache License 2.0
344 stars 34 forks source link

Text-only approach to punctuation restoration Pro/Con #2

Open saattrupdan opened 1 year ago

saattrupdan commented 1 year ago

I know this project is at an early stage, but I just want to flag an alternative approach to punctuation restoration. It's a package called punctfix, and can be found here (I'm not a contributor to that package). Rather than using Whisper models, they use a NER approach, and works really well and super fast.

patrickvonplaten commented 1 year ago

Thanks for the pointer @saattrupdan!

I've checked out punctfix a bit and it indeed works very well! However it has some short-coming that I hope we can address by also taking the audio into consideration.

For phrases that are not clearly stated as a question, such as "we are leaving in 5 minutes no", punctfix cannot predict this as a question, simply because one needs to hear the audio for this. All of the following are valid solutions:

- "We are leaving in 5 minutes! No!"
- "We are leaving in 5 minutes. No."
- "We are leaving in 5 minutes, no?"

For this example punctfix gives:

>>> from punctfix import PunctFixer
>>> model = PunctFixer(language="en")

>>> example_text = "we are leaving in 5 minutes no"
>>> print(model.punctuate(example_text))
We are leaving in 5 minutes No!  

which really cannot always be correct depending on the audio.

Also I noticed some problems with the apostrophe: https://github.com/danspeech/punctfix/issues/13