Currently text_normalize("Hello\nWorld") yields HelloWorld.
Line feed (LF) and carriage returns (CR) are filtered out because they are Unicode characters in the "Other, Control" (Cc) Category. Text normalization should preserve word boundaries with spaces.
Currently
text_normalize("Hello\nWorld")
yieldsHelloWorld
. Line feed (LF) and carriage returns (CR) are filtered out because they are Unicode characters in the "Other, Control" (Cc) Category. Text normalization should preserve word boundaries with spaces.See also: http://www.unicode.org/reports/tr29/tr29-29.html#Word_Boundaries 🙈