Some languages (like French) use accents in words. Example assécher means to dry up (skin, hair etc.).
The native JS RegExp \b splits words in a naive, Latin-only way, so the character é gets interpreted as a word separator, thus yielding to ass é cher and ass gets censored out.
So we end up with ***écher which is nonsense in French.
This PR is an improvement to the previous, incomplete attempt that had been made to support the French via user-provided word sep.
The new option enhancedWordSep, defaulting to false, will use a separation regexp which works for accented languages.
Some languages (like French) use accents in words. Example
assécher
means to dry up (skin, hair etc.). The native JS RegExp\b
splits words in a naive, Latin-only way, so the characteré
gets interpreted as a word separator, thus yielding toass é cher
andass
gets censored out.So we end up with
***écher
which is nonsense in French.This PR is an improvement to the previous, incomplete attempt that had been made to support the French via user-provided word sep.
The new option
enhancedWordSep
, defaulting tofalse
, will use a separation regexp which works for accented languages.