malcolmgreaves / language-detection

Automatically exported from code.google.com/p/language-detection . Some after-the-fact modifications to get this working within sbt.
Apache License 2.0
5 stars 5 forks source link

Faster language detector version #2

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
First of all, congratulations for your library it has very good performance 
even in very hard settings. 

I have implemented a faster version of the library based on your algorithm, 
using arrays instead of HashMaps. It runs aprox. 5 to 8 times faster.

You can find the sources attached, feel free to add them to the library if you 
find them useful.

Original issue reported on code.google.com by elmer.garduno@gmail.com on 18 Jan 2011 at 5:07

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks!!!
I'll check your code. Give me a little time.

Original comment by nakatani.shuyo on 18 Jan 2011 at 9:07

GoogleCodeExporter commented 9 years ago
I confirmed that your code is 4x faster on average for text in various 
languages.
So I will recently merge it with necessary changes (float may overflow, 
conflict to test code, etc).
Thank you very much.

the below: 100 times detect time(ms). left: original, right: your alteration
----
ar: 2296 1375
ar: 262 82
en: 671 99
fa: 203 39
fa: 292 64
fa: 536 198
fa: 560 206
fa: 528 203
fa: 409 125
fa: 599 222
fa: 561 190
fa: 708 286
fa: 1000 447
fa: 632 258
fa: 602 224
fa: 243 63
gu: 115 41
gu: 65 14
it: 678 125
it: 678 125
it: 660 120
it: 720 125
ja: 138 31
ja: 134 26
ja: 150 37
zh-cn: 89 12
hi: 448 64
mr: 767 112
ne: 676 98
hi: 1079 166
mr: 331 43
ne: 981 147
mr: 228 31
mr: 227 40
mr: 330 72
mr: 184 35
tl: 1640 213
ru: 518 110
tl: 569 173
tr: 1058 144
zh-tw: 234 46
ur: 396 156
ur: 384 147
ur: 466 187

amount time org: 24045  alt: 6721

Original comment by nakatani.shuyo on 20 Jan 2011 at 8:49

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Great news!

Thank you.

Original comment by elmer.garduno@gmail.com on 21 Jan 2011 at 5:22

GoogleCodeExporter commented 9 years ago
I've updated langdetect based on your code.
Very thanks!

Original comment by nakatani.shuyo on 24 Jan 2011 at 11:05

GoogleCodeExporter commented 9 years ago
With the changes that you made for the release it runs even faster, test it if 
you have some time.

Original comment by elmer.garduno@gmail.com on 24 Jan 2011 at 8:29