Closed valeriansaliou closed 4 years ago
Note that I'll be happy PR-ing this myself, if there is no blocking reason as to why Slovak has not been implemented from now.
s there any reason it's not there (I see that Slovene is supported, while Slovak is not there).
If I remember correctly I just tried to implement the most popular languages by number of native speakers using a list in wikipedia. Probably Slovak was not in the list.
They are few reasons, why I've decided not to add every language possible:
Shortly, the languages we got implemented in whatlang was a reasonable pragmatic trade-off. In most cases I would be OK to add a new language on demand if someone has real needs and requests it.
Btw, I just added Slovak in whatlang 0.9.0.
Thank you for using whatlang. I've implemented the library just for fun, but you sonic search engine brings it to a real practical use :)
Hello @greyblake
Thank you so much for the quick answer and release, really appreciated!
Slovak support has just been added to Sonic: https://github.com/valeriansaliou/sonic/commit/19412ce05a802ef1e6054b751faaef50cab5d36b
On the reasons as to why not all languages are available, I completely understand.
The main problem is mostly about so many different European languages sharing the same Latin script, there would probably be an optimization path where you'd add a pre-detection pass after Latin is detected as an alphabet, where you'd restrict even further the language list by accented characters. Eg. "ē" appears in Latvian (and possibly other Baltic languages), but definitely does not occur in French (though, it's not as straightforward, as a Latvian sentence may not contain any accented character that characterizes a Baltic language, so there need to be a fallback to avoid such false negatives).
While, Cyrillic, Arabic and Mandarin, Kanji, etc. scripts do not have this performance hit issue.
Thanks again! Valerian.
@valeriansaliou Thanks for the suggestion. I also had similar idea in mind and even implemented similar thing years ago in Smartdict project.
However this approach becomes trickier considering that text in one language, may include words from another language.
E.g. German alphabet does not have é
. But french word Exposé
is widely used in the modern German.
Ah, snap yes, indeed. I understand then, trickier than it seems w/ modern language usage.
Hello there!
Using
whatlang
as part ofsonic
language detection system. It works great overall, thanks a lot for your work, and for adding Latin recently, which has been implemented insonic
.I've got an user on my end requesting Slovak to be added to
sonic
. Do you think this is something possible fromwhatlang
, is there any reason it's not there (I see that Slovene is supported, while Slovak is not there).Ref: https://github.com/valeriansaliou/sonic/issues/178