languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.46k stars 1.4k forks source link

[en] False warnings in rulegroups IT_IS and IT_IS_2 #3467

Open MikeUnwalla opened 4 years ago

MikeUnwalla commented 4 years ago

After I removed duplicate warnings, 49 warnings remained in ITS_JJ_NNSNN. Of those, 35 are false warnings:

TRUE ERRORS

udomai commented 4 years ago

Oh, wow! I think this needs to be fixed, it does not a shed a good light onto LT if we don't get that right. No need to panic though, its deactivation rate has been 2 % (apply rate 69 %, 468 times opened over the last 7 days) and Matomo shows that out of 6 deactivations this week, half were actually correct warnings. @MikeUnwalla , how could we go about this? It seems hard to condense those FP into only a few antipatterns... can the rule maybe be more specific (at the price of risking more FN)? Like focusing on the pattern of ITS_JJ_NNSNN immediately followed by a VBZ?

MikeUnwalla commented 4 years ago

Rulegroup IT_IS has more than 50 rules. When I prevent ITS_JJ_NNSNN from finding any match, some of those rules find the true errors.

As you write, if we make the rule more restrictive by expanding the pattern, then LT will not find some errors. But, eventually, we will make more rules (when someone complains about a false negative).

You are right that we can't solve the problem with only a few AP. But, I think the best way to solve the problem is to add AP (even if there are many of them). Our choice is one of these:

udomai commented 4 years ago

I would go for the APs then. It's the more hands-on approach and I guess it's less prone to cause priority issues. When I will be looking at the performances (application vs. deactivation) of rules concerning its vs. it's to see which rules should be fixed first, I can collect an overview of deactivation contexts for writing APs.

MikeUnwalla commented 4 years ago

FP in other rules in rulegroup IT_IS. I ignored FP in texts where the grammar was incorrect.

IT_IS[10]

IT_IS[18]

IT_IS[22]

IT_IS[25]

IT_IS[27]

IT_IS[28]

IT_IS[30]

IT_IS[37]

IT_IS[38]

IT_IS[42]

IT_IS[43]

IT_IS[44]

IT_IS[46]

IT_IS[7]

IT_IS[9]

TO_VB_ITS_NN[50]

udomai commented 4 years ago

Thank you, Mike! That is super helpful. I will give you feedback from the user statistics standpoint when you'll be back from your holidays :)

MikeUnwalla commented 4 years ago

False warnings in rulegroup IT_IS_2.

(Aside: why are there 2 rulegroups? Should we have only one??

IT_IS_2[12] The warnings for 'its' are correct. Many of the warnings for 'it' are not correct, because the text uses the subjunctive (www.lexico.com/grammar/when-to-use-the-subjunctive). Split the rule, and change the message for 'it'.

IT_IS_2[15]

IT_IS_2[16]

IT_IS_2[17]

IT_IS_2[26]

IT_IS_2[6]

IT_IS_2[8]