mesolitica / malaya

Natural Language Toolkit for Malaysian language, https://malaya.readthedocs.io/
MIT License
468 stars 127 forks source link

Fuzziness is not defined #129

Closed GeneralIris closed 2 years ago

GeneralIris commented 2 years ago

This is my first time adding an issue here.

The Specs Used: Pycharm 2021.3.3 Python 3.9 The current symspellpy : installed within the same env as malaya

GPU: RTX3080 10G CPU: AMD Ryzen 5600x

So basically im testing the malaya.spell.symspell() by supplying comments I retrieved from Shopee.

I followed the instruction where it required me to install symspellpy first and indeed tested .correct and .correct_text using predefined variables both work fantastic.

But when I supply the comments from .csv, then the error occured.

Here's the error .txt : ErrorOutput.txt

And the sample code:

def deepSpellerCorrector(comment,symspell_corrector): return symspell_corrector.correct_text(comment)

symspell_corrector = malaya.spell.symspell() df['CommentClean3'] = df['CommentClean2'].apply(lambda x : deepSpellerCorrector(x,symspell_corrector))

An Example of CommentClean2 : terbaik barang dah dapat dan dah di pasang amat memuaskan pos pun laju sampai harga pun berbaloi thanks

GeneralIris commented 2 years ago

New Update, I check the actual comment that cause such error:

seller kata product new tapi hardisk nmpak kotor sikit n ada kemek2 manja storage pula patut 1tera tapi bila dh install kat pc ada 931gb je hmm pelik

Directly applying it cause the error too

symspell_corrector = malaya.spell.symspell() symspell_corrector.correct_text('seller kata product new tapi hardisk nmpak kotor sikit n ada kemek2 manja storage pula patut 1tera tapi bila dh install kat pc ada 931gb je hmm pelik')

GeneralIris commented 2 years ago

Okay, i tried tweaking with the spell.py by adding fuzziness = [] at line 552, reload pycharm and yeah it works.

But im not sure whether that is actually the way to solve it cause I saw fuzziness = [] also appear at line 240

Here's a screenshot of the fuzziness = [] at line 552, Test

huseinzol05 commented 2 years ago

My bad, indeed it is a bug. I will bump asap, thanks!

huseinzol05 commented 2 years ago

Fixed, can install from latest master branch, warning, i did massive revamped for spelling correction interface, https://malaya.readthedocs.io/en/latest/load-spelling-correction-jamspell.html, will release it soon