ykdojo / editdojo

(I'm no longer working on this - currently working on https://github.com/ykdojo/defaang)
https://www.csdojo.io/edit
MIT License
332 stars 98 forks source link

Added function for detecting message lang #29

Open emills11 opened 5 years ago

emills11 commented 5 years ago

I went ahead and made a basic function for detecting the language of a message, in order to identify it as either being typed in the user's target language (so it can be seen by other users) or native language (so it can be ignored).

I did run into an issue concerning the langdetect library; due to the nature of the library's probability-based algorithm, it will occasionally misidentify a message's language if the message contains spelling errors. For example, "Hello World!" will return English, while "Helo Woorld!" will return Dutch. I could use some help coming up with a solution for this problem.

emills11 commented 5 years ago

I may have found a possible solution to the above problem by iterating through the Language objects that are returned when calling detect_langs(), and checking to see if any of the probable languages match either the user's target language or native language. Will push a second commit when I get home.

ykdojo commented 5 years ago

Thank you. I'll take a look at this after I publish my next video about #22.