mesolitica / malaya

Natural Language Toolkit for Malaysian language, https://malaya.readthedocs.io/
MIT License
468 stars 127 forks source link

Normalizer error due to invalid number characters #150

Open Kensvin28 opened 1 year ago

Kensvin28 commented 1 year ago

(file:///C:/Users/PAVILION/AppData/Local/Programs/Python/Python310/lib/site-packages/malaya/normalizer/rules.py:165), in check_repeat(word) 162 return word, 1 164 if word[-1].isdigit() and not word[-2].isdigit(): --> 165 repeat = int(word[-1]) 166 word = word[:-1] 167 else:

ValueError: invalid literal for int() with base 10: '²'

², ³, and some other Unicode characters like U+2776 (❶) - U+2792 (➒) returns true for isdigit(), but cannot be converted into int, so it returns a value error.

huseinzol05 commented 1 year ago

haha, nice one.