Open tannisroot opened 3 months ago
Is this already merged?
Is this already merged?
the status of the PR is open so no
Hello would you please do it in one regexp and add appropriate comments what is that thingy doing into the code? and maybe it's worth to make it as an option?
text = re.sub(r'\[.*?\]|\(.*?\)', '', text).strip()
wyoming-whisper-cpp does something similar, but I've encountered far more non-speech tokens than just [BLANK AUDIO] so this change instead just removes square and round brackets and their contents altogether. https://github.com/rhasspy/wyoming-whisper-cpp/blob/476b0e631392034a94196eb578b3d0a60164af53/wyoming_whisper_cpp/handler.py#L92