languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.36k stars 1.39k forks source link

[en] EN_REDUNDANCY_REPLACE false positives and comments #1997

Open MikeUnwalla opened 5 years ago

MikeUnwalla commented 5 years ago

Here are some comments about some of the results in https://internal1.languagetool.org/regression-tests/20191001/result_en_20191001.html.

False positives:

Comments:

I know that you have put much effort into the EN_REDUNDANCY_REPLACE rules. But, one of my tasks as maintainer is to make sure that there are not too many many false positives. Could you please take some time to improve the rules?

TiagoSantos81 commented 5 years ago

I know that you have put much effort into the EN_REDUNDANCY_REPLACE rules. But, one of my tasks as maintainer is to make sure that there are not too many many false positives. Could you please take some time to improve the rules?

Creating the rules that are a conversion from AtD takes about an hour. Not really that much and anyone that works in grammar checkers knows that. The extra effort is the time it takes to appease requests for further improvement - while several other rules don't follow such strict criteria - and knowing that these recommendation do not follow the usability principal. Tools like Ludwig.guru do exist (and yes, redundancy used for emphasis) because they allow writers to know what is the common way to say something.

Grammar checkers are used to know what the prescriptive current of a language advices. In extremis, without the prescriptive part, one is well-served with pidgin. This is so self-evident that anyone not following this is certainly not doing what is intended to be doing here, and certainly not what the users think they are doing.

Finally, I place a question, that is easily exemplified, if a grammar checker should be so careful as it has been shown in these last few rules, why do these type of rules are being further developed on the LT-premium. Exemplified by https://github.com/languagetool-org/languagetool/commit/718acbba73b030fbee0f9d1731e56f109f471c27#r35187895

tiff 10 days ago • Member Here's a rule that Mike wrote for LanguageToolPlus:


        <rule id="CIRCLE_AROUND" name="circle around (circle)" default="off">
            <antipattern>
                <token chunk="E-NP-singular">circle</token>
            </antipattern>
            <antipattern>
                <token chunk="E-NP-plural">circles</token>
            </antipattern>
            <pattern>
                <token postag_regexp="yes" postag="VB.*" inflected="yes">circle</token>
                <token>around</token>
            </pattern>
            <message>This phrase is redundant. Consider writing <suggestion>\1</suggestion>.</message>
            <short>Redundant phrase</short>
            <example correction="circled">The aircraft <marker>circled around</marker> the airport.</example>
            <example>The aircraft <marker>circled</marker> the airport.</example>
            <example>Draw a circle around the dot.</example>
            <example>Draw circles around the dot.</example>
        </rule>

Sumarizing, if something indeed needs some extra patterns, do them in new rule, not on the old and well-proven ones.

PS - If I found that it was a good use of my time to find exceptions to every single rule in any entry off these lists, I would. It is not hard and any imaginative user can also do it. But I guess most people try to spend their time with more appealing gimmicks.