Detection of long texts is not running parallelized

pemistahl / lingua

The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Apache License 2.0

706 stars 63 forks source link

Detection of long texts (or usage of withLowAccuracyMode()) only uses a single worker thread for language detection.

The reason for this is that a work task per ngram length is submitted. However, for long texts and when using withLowAccuracyMode() only the ngram length 3 is checked. Therefore only a single work task is submitted. One solution might be to perform the per language computation in computeLanguageProbabilities each as a separate work task; however, that approach will probably only be worth it if the input is long enough (have not verify this).

pemistahl / lingua

Detection of long texts is not running parallelized #146