rogerbinns / apsw

Another Python SQLite wrapper
https://rogerbinns.github.io/apsw/
Other
715 stars 96 forks source link

Implement Unicode TR-29 and TR-14 #509

Closed rogerbinns closed 2 months ago

rogerbinns commented 4 months ago

This is needed to get #473 to work reasonably well, especially when one user perceived character is many unicode codepoints. The word splitting is also a lot better than trying to do it by codepoint category as in existing unicode61 (eg that messes up don't) and sentence splitting is good for a better snippet function.

rogerbinns commented 2 months ago

Equivalent_Unified_Ideograph should be investigated. Looks useful for stripped function for getting compatibility codepoint.