greyblake / whatlang-rs

Natural language detection library for Rust. Try demo online: https://whatlang.org/
https://whatlang.org/
MIT License
966 stars 108 forks source link

Failure to recognize Afrikaans #35

Closed cloudcalvin closed 5 years ago

cloudcalvin commented 5 years ago

Hello,

Good work!

The algorithms need a bit of tweaking I suspect. I typed in an ordinary sentence in my home language, Afrikaans, but it incorrectly identified it as Dutch. In fact, all of the text I typed in either came up as English, Spanish or Dutch but not once dit it show Afrikaans.

Here is a sample that I used: Hierdie is Afrikaans, maar tog dink die program dit is Nederlands. Het jy vir my 'n antwoord hierop?

Greetings Calvin

greyblake commented 5 years ago

Hey, thanks for the report!

Here is the list of supported languages: https://github.com/greyblake/whatlang-rs/blob/master/SUPPORTED_LANGUAGES.md

So, it looks like Afrikaans is not supported at the moment. However, according to wikipedia, there are > 7M native speakers, I think it makes total sense to add it.

Can you please provide some links where I can get more text samples for this language? (news websites, books, etc.)

Thanks

cloudcalvin commented 5 years ago

Apologies, I didn't notice THAT document!

I was looking at the code where I noticed the "afr" entry in the misc/data.json and assumed that it was supported!

Yes here in South Africa there are still a lot of Afrikaans speakers but lately you find many Afrikaans people in Canada, UK, Australia, New Zeeland. Not sure if the language is spreading or dying!

Anyway, here are some online newspapers : https://www.newspaperindex.com/af/koerante/Suid-Afrika/

Here are some books (I can't find a good source of free Afrikaans books sorry) :
https://b-ok.cc/book/2541124/d03670 https://b-ok.cc/book/2541050/fcc037 https://b-ok.cc/book/2540905/cd79cb https://b-ok.cc/book/2539847/d84bd8 https://b-ok.cc/book/2539570/b56032 https://b-ok.cc/book/2539324/f3848e https://b-ok.cc/book/2538894/2f50d7 https://b-ok.cc/book/2517246/022f12 https://b-ok.cc/book/2392727/209353 https://b-ok.cc/book/2381297/8fe6fc

And this link seems to have poetry, essays and book reviews : http://www.oulitnet.co.za/vryepoort/default.asp

Kind Regards Calvin

cloudcalvin commented 5 years ago

Oops, those newspapers are all english except for two!

Here are the Afrikaans newspapers I could find : https://www.pressreader.com/south-africa/die-burger/textview http://www.dievryburger.co.za/ https://www.netwerk24.com/ https://maroelamedia.co.za/

Also project Gutenberg seems to have many free Afrikaans books : https://www.gutenberg.org/browse/languages/af

greyblake commented 5 years ago

Thank you :+1: I've added support for Afrikaans in this PR: https://github.com/greyblake/whatlang-rs/pull/36

I will release a new version soon. Please reopen issue if this does not work for you as you expect. :)