moos / wordpos

Part-of-speech utilities for node.js based on the WordNet database.
478 stars 41 forks source link

Suggestions for supporting other languages? #34

Open niftylettuce opened 4 years ago

niftylettuce commented 4 years ago

Not sure if you've had to work on this @moos but curious if you or others have figured out support for multiple locales/langs.

moos commented 4 years ago

"WordNet® is a large lexical database of English." wordpos is just a Javascript front to the WordNet database.

Just as I was about to hit send on above, I re-thought your question and a quick Google search later came across the Global WordNet Association -- apparently there are "WordNet"s in other languages.

I looked at a few (German, French, Japanese) -- they seem to have the data in various XML formats -- so not a dropin for wordpos which is based on a specific (optimized) WordNet format.

Still an intriguing possibility -- based mainly on level of interest and contributions. Do you have a specific use case in mind?

niftylettuce commented 4 years ago

Yes, it is towards my efforts with https://github.com/spamscanner/spamscanner. I am building a filter to detect gibberish and the language of the message (and not be reliant upon Content-Language or <meta> or <html lang tags.