Open rashmiranjanrrs opened 4 years ago
Hey,
How to add new subset ?
src/LanguageDetector/subsets/
foldertests/LanguageDetector/LanguageDetectionTest.php
file to validate your subsetAdd new language {the new language}
Subset structure
A subset file is a JSON encoded file with the following structure:
{
"freq":{"D":662077, [...], "tha":240340},
"n_words":[260942223,308553243,224934017],
"name":"en"
}
More
A you may guess, a "learning" tool has to be written to generate a subset. It's not yet packaged with the library but might be in the future. An advise: to generate a reliable subset file, you have to collect a large number of files in the desired language and, if possible, from various language variations.
Hope this helps
@landrok hey! can you give any advice how to extend your library to support georgian (ka
) language (https://en.wikipedia.org/wiki/Georgian_language)
Hey can you add more Indic language or can you share the pattern or the structure of subset so that I can able to add new languages as per my requirement. How to add new subset ?