How to improve detection rate?

I have document in multiple languages (4 or 5) and it is organized more or less 
like this
* 10 pages in language A,
* 10 pages in language B,
* 10 pages in language C
and so on.

Total text length is 475 216 characters so the text is quite long. I've set 
maxTextLength to 10 million characters so whole text should be analyzed.

Unfortunately in 9 out of 10 runs (ok, maybe 4 out of 5 ;) I get only one 
language detected with probability of 0.9999999 or similar. I'm aware that 
multiple language documents are not supported but still I would consider this a 
bug since 80% of text in the document is NOT in detected language. Even if I 
get two languages reported as probable their distribution is skewed (0.85 to 
0.14).

I suspect that text is sampled somehow and sampling is skewed towards begining 
of the text. How can it be changed to improve detection? It's enough for my use 
case if program output would be similar to:
en: 0.13
de: 0.14
fr: 0.15
it: 0.13
etc.

Would changing Detector.ITERATION_LIMIT to larger value help?

Original issue reported on code.google.com by MKlepacz...@gmail.com on 30 Apr 2013 at 11:47

weiqk / language-detection

How to improve detection rate? #55