ser / wyoming-whisper-api-client

Wyoming protocol server for the Whisper API speech to text system
MIT License
26 stars 5 forks source link

Remove non-speech descriptions from output #3

Open tannisroot opened 3 months ago

tannisroot commented 3 months ago

wyoming-whisper-cpp does something similar, but I've encountered far more non-speech tokens than just [BLANK AUDIO] so this change instead just removes square and round brackets and their contents altogether. https://github.com/rhasspy/wyoming-whisper-cpp/blob/476b0e631392034a94196eb578b3d0a60164af53/wyoming_whisper_cpp/handler.py#L92

StrandmonYellow commented 1 month ago

Is this already merged?

tannisroot commented 1 month ago

Is this already merged?

the status of the PR is open so no

ser commented 1 month ago

Hello would you please do it in one regexp and add appropriate comments what is that thingy doing into the code? and maybe it's worth to make it as an option?

text = re.sub(r'\[.*?\]|\(.*?\)', '', text).strip()