umputun / tg-spam

Anti-Spam bot for Telegram
https://tg-spam.umputun.dev
MIT License
167 stars 30 forks source link

universal detection of scripts for multi-lingual check #106

Closed umputun closed 1 month ago

umputun commented 1 month ago

This PR checks all the Unicode scripts as a part of isMultiLang instead of the list of usual suspects. The latest example reported by the user had a mix of Gothic, Cyrillic, and Greek scripts, and the previous version didn't detect the Gothic part as a multilang word. Added a new unit test case for that example as well.