Closed Marcono1234 closed 2 years ago
that approach will probably only be worth it if the input is long enough
This is true, actually. I did some tests in the past and chose the current single worker thread solution because the parallelized version was not faster in low accuracy mode. It just produced more overhead. I will most probably leave it this way. That's why I close this issue for now.
Detection of long texts (or usage of
withLowAccuracyMode()
) only uses a single worker thread for language detection.The reason for this is that a work task per ngram length is submitted. However, for long texts and when using
withLowAccuracyMode()
only the ngram length 3 is checked. Therefore only a single work task is submitted. One solution might be to perform the per language computation incomputeLanguageProbabilities
each as a separate work task; however, that approach will probably only be worth it if the input is long enough (have not verify this).