Open jaumeortola opened 4 years ago
These pairs have been disabled here: https://github.com/languagetool-org/languagetool/commit/1be44fceb78a86a7976792a38ef844e860654f68 Can they be re-enabled? I don't know if the change in the tokenizer has had any effect on the CONFUSION_RULE.
These pairs don't seem to have special chars, so I don't think the tokenizer will affect them? In other words, if they caused many false alarms before, I don't think that will be fixed.
"il" and "ils" (the most frequent false alarms) appeared in "qu'il" and "qu'ils", which have been affected by the change in the tokenizer. But not the other words. If CONFUSION_RULE uses the tokenizer (which I guess it does, at least for the analyzed word), then "il/ils" can be reevaluated. But perhaps we can deal with "il/ils" just with agreement rules.
I just ran the re-evaluation, but there's no change, probably because we use a tokenization here that matches that of the ngram data from Google Books.
There are many false alarms (after the tokenizer update). See: https://internal1.languagetool.org/regression-tests//20200610/result_fr_20200610_table.html and https://internal1.languagetool.org/regression-tests/via-http/2020-06-11/fr/result_java_CONFUSION_RULE.html The most prominent is il/ils, and others: sait/sais, vert/vers, mai/mais.