Closed jordimas closed 1 year ago
File used as input on the first example: text-catalan.txt
As stated in the documentation, the detection of multiple languages is experimental, so please do not expect reliable results usable in production. The current algorithm tries to identify a language for each single word and then merge contiguous words having the same language, basically. I will certainly try to improve the algorithm in later releases.
Using version 1.3.1
Using a text that is in Catalan language only, that does not contain any fragments from other languages, and that it's very standard kind of text, _detect_multiple_languagesof method detects: CATALAN, SOMALI, LATIN, FRENCH, SPANISH and PORTUGUESE. The expectation is that should report that the full text is CATALAN.
Code to reproduce the problem:
Related to this problem also is that _detect_languageof and _detect_multiple_languagesof predict different languages over the same text. Below an example on the same input _detect_languageof predicts Catalan and _detect_multiple_languagesof predicts Tsonga.
My expectation is that both methods will predict the same given the same input.
Code sample: