landrok / language-detector

A fast and reliable PHP library for detecting languages
MIT License
122 stars 19 forks source link

wrong language detection #5

Open FLasH3r opened 3 years ago

FLasH3r commented 3 years ago

I have the following text with the corresponding language as detected by this package (all English) Only the bold text is correct.

besides using composer install ... I have done anything

The text here is just an example, it's from github blog (title of the last 10 posts)

if I do new \LanguageDetector\LanguageDetector(null,['en']); it will work, but that is not the goal.

the code looks like this:

$languageDetector = new \LanguageDetector\LanguageDetector();

foreach($titles AS $title) {

    $languages = $languageDetector->evaluate($title)->getLanguage();

    echo $title.' - '.(string)$languages.PHP_EOL;
}
vesper8 commented 3 years ago

Looks like this suffers from the same thing as the more popular https://github.com/patrickschur/language-detection

It does a good job with long texts but is borderless useless for short sentences.. getting it wrong at an alarmingly high rate

Still looking for a reliable language detector that works well with short sentences in case anyone finds one please share

FabianoLothor commented 3 years ago

ward

dmaicher commented 3 years ago

Still looking for a reliable language detector that works well with short sentences in case anyone finds one please share

@vesper8 https://github.com/fntlnz/cld2-php-ext works good for my use-cases also with rather short texts. It detects all the above cases as English