Open GoogleCodeExporter opened 8 years ago
This is Khmer language, isn't it?
The 11-18-2010 version had bundled a experimental profile of Khmer by mistake
and could detect the text.
The current version doesn't bundle it (because of no test data) and langdetect
can't estimate features which are not contained in profiles. So the exception
is raised.
If you want to detect Khmer, put the file of profiles/km in 11-18-2010
version's profiles into a new profile directory.
The Khmer language profile is not tested, but I expect it go well because Khmer
alphabet is proper! :D
Original comment by nakatani.shuyo
on 6 Dec 2011 at 10:31
[deleted comment]
Aha, I see. I have no specific interest in the Khmer language, but I am
carrying out a large-scale comparison of off-the-shelf language identification
systems.
So far, I have been comparing:
1) langid.py (my system!)
http://www.csse.unimelb.edu.au/research/lt/resources/langid/
2) your system
3) TextCat (http://www.let.rug.nl/vannoord/TextCat/)
4) Chromium CLD (http://code.google.com/p/chromium-compact-language-detector/)
5) Google's langid API
6) Microsoft's langid API
I noticed that I was using a year-old version of your system, so I upgraded to
the latest version and was surprised to find that performance dropped on many
datasets. If you removed some languages from consideration this would explain
the drop in performance. For your reference, in the datasets I have tested,
your system attained 97-99% accuracy for KM in the datasets which include it.
Original comment by saf...@gmail.com
on 6 Dec 2011 at 10:56
So what are the accuracy as of now for each of the libraries?
Original comment by dennis97...@gmail.com
on 11 Aug 2014 at 4:50
That really depends on the target data, but here[1] is my most recent paper
comparing the accuracy of a number of 8 off-the-shelf systems on Twitter
messages, including my own langid.py and Shuyo's language-detection (the
repository where this message is posted).
[1] http://aclweb.org/anthology/W/W14/W14-1303.pdf
Original comment by saf...@gmail.com
on 11 Aug 2014 at 11:23
Thank you very much! Will read it when I'm free.
Original comment by dennis97...@gmail.com
on 15 Aug 2014 at 10:18
Original issue reported on code.google.com by
saf...@gmail.com
on 6 Dec 2011 at 2:33Attachments: