Semantic analysis may be required for correct word-break in languages that don't require spaces, such as Thai, Japanese, Chinese or Korean. This can require fairly sophisticated support if Level 3 word boundary detection is required, and usually requires drawing on platform OS services.
Golang's regexp package documents
\b
as working only with ASCII text, which affects how our whole-word filters match.UTR #18 has some guidance for this. We might be able to achieve what they call "Level 1" or "Level 2" word boundary support with comprehensive replacements for
\b
using the Unicode features that Go can match on. "Level 3" might be too much work:Discovered while investigating #3128.