xzycn / language-detection

Automatically exported from code.google.com/p/language-detection
0 stars 0 forks source link

performance improvements #52

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Here are a few things to performance:
* make Vietnameze normalization optional
* make "strip URLSs" and "strip email" optional: some (most?) "real" 
application have some kind of text filtering; this library is only intended for 
language detection; markup removal is another topic.
* use StringBuilder? instead of StringBuffer? for local variables as 
synchronization is not needed
* keep a static cache for normalization and uppercase: this will require more 
memory but increase performance.

I have created o clone of the project and pushed the changes there (under 
optimizations "branch": 
https://code.google.com/r/ionutcpaduraru-language-detection/

Here is the changeset
https://code.google.com/r/ionutcpaduraru-language-detection/source/detail?r=1324
8df53f642409c7b0ab31ddc030b91c9afadb&name=optimizations

Feel free to use (or not to use) any of those changes.

Original issue reported on code.google.com by ionut.c....@gmail.com on 16 Feb 2013 at 11:23