This is needed to get #473 to work reasonably well, especially when one user perceived character is many unicode codepoints. The word splitting is also a lot better than trying to do it by codepoint category as in existing unicode61 (eg that messes up don't) and sentence splitting is good for a better snippet function.
[x] Generate break tables
[x] Test code from tables
[x] Grapheme cluster
[x] Word
[x] Sentence
[x] Unicode categories\
[x] Equivalent_Unified_Ideograph
[x] Emoji
[x] Regional Indicator
[x] Case folding
[x] Investigate tr14 line breaking for implementing textwrap
[x] textwrap
[x] Wide codepoints (east asian width == F or W)
[x] Grapheme cluster count
[x] Grapheme cluster range substring
[x] Grapheme cluster width
[x] Startswith / endswith
[x] find/index
[x] Set doc order to bysource and rearrange functions into logical doc order
[x] Grapheme cluster base char (remove diacritics equivalent)
[x] Compatibility codepoints (eg roman numeral ⅲ becomes latin iii)
[x] Update apsw.ext.format_query_table to use textwrap
This is needed to get #473 to work reasonably well, especially when one user perceived character is many unicode codepoints. The word splitting is also a lot better than trying to do it by codepoint category as in existing unicode61 (eg that messes up
don't
) and sentence splitting is good for a better snippet function.