Closed ghost closed 3 years ago
Hi @4086606,
It will be quite challenging to support all the 500+ languages because Guesslang's machine learning model needs a lot of sample files for training.
In fact to reach 70% to 80% of correct language predictions, you'll have to train the model with around 1k samples files per language. And to reach 90% to 95% of prediction accuracy, you'll need up to 25k samples for each language.
I'm working on supporting 14 new languages https://github.com/yoeo/guesslang/issues/29#issuecomment-863867962 and I'll check Linguist for sure to see how they managed to handle so much languages.
Thank you.
1 THOUSAND!? I greatly underestimated the model sizes, my lack of familiarity with ML has brought this on sorry
Thanks for your hard work 👍
No problem, and thanks for the support!!!
There's a wealth of code samples and extension mappings over at github/linguist that can be used in this repository