rash150996 / Hindi_spell_check

Spell checker and recommendation in Hindi language based on python
The Unlicense
0 stars 1 forks source link

How was wordlist.bin file trained? #1

Open sumyatthitsarr opened 4 years ago

sumyatthitsarr commented 4 years ago

Can I use pytorch lm.pt instead of wordlist.bin?

rash150996 commented 4 years ago

@Wickky flair pos model was trained over in-house data of ~7gb and then the words were extracted based on their tags and were visualized over scattertext and then the chunk of words that comes under infrequent list were added along with the frequent list. And every day almost I update the list to match up an effective recommendation and spell-checker.

sumyatthitsarr commented 4 years ago

Could you please give details steps and implementation of this spell checker? Is it rule-based? Could you guide me if I want to build one in my own language?

rash150996 commented 4 years ago

@Wickky this is not rule-based because in Hindi, its a lot complicated than basing it on rules. I had to stick with data-driven. In which language are you trying to build one?

sumyatthitsarr commented 4 years ago

I would like to build in Burmese Language. It is a morphological rich language. I have tried an encoder decoder with attention mechanism as in machine translation tasks. But I didn't get the results as I excepted. Do you have any ideas to build one?