Open DonaldTsang opened 4 years ago
Where is the source text dataset for the Ngrams of those 100 languages? Would like to see if it is different from wooorm/franc#78 usage of UDHR, and if it is more accurate than them.
@DonaldTsang I really don’t know because I’m not the dev but isn’t it in _languageData.js
?
Sent with GitHawk
Where is the source text dataset for the Ngrams of those 100 languages? Would like to see if it is different from wooorm/franc#78 usage of UDHR, and if it is more accurate than them.
@DonaldTsang (inside the lib folder)
Sent with GitHawk
Where is the source text dataset for the Ngrams of those 100 languages? Would like to see if it is different from wooorm/franc#78 usage of UDHR, and if it is more accurate than them.
@DonaldTsang But it’s weird because there isn’t all language and the ones which are in it are not written in the actual language (for example: in “fr” it isn’t written in French and I don’t understand what’s written)
Sent with GitHawk
Where is the source text dataset for the Ngrams of those 100 languages? Would like to see if it is different from wooorm/franc#78 usage of UDHR, and if it is more accurate than them.
@DonaldTsang The dev used primarily Unicode checking to determine the language tho
Sent with GitHawk
@Animenosekai if it does only use Unicode checking, that would actually be really sweet as that is very useful for my cause of making language checking easier (which I hope can re implement in Python).
The _languageData.js
seems like N-Gram data.
@Animenosekai if it does only use Unicode checking, that would actually be really sweet as that is very useful for my cause of making language checking easier (which I hope can re implement in Python).
I don't think that it uses only Unicode checking but why don't you open guessLanguage.js
as it should contain everything you wanna know
Where is the source text dataset for the Ngrams of those 100 languages? Would like to see if it is different from wooorm/franc#78 usage of UDHR, and if it is more accurate than them.