languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
11.82k stars 1.38k forks source link

<marker><token> in antipatterns #4246

Open MikeUnwalla opened 3 years ago

MikeUnwalla commented 3 years ago

Refer to https://github.com/languagetooler-gmbh/languagetool-premium/commit/d9ccf6e2ef06cffd55d2208bb63a3cf60a06a83b

Rule CONFUSION_OF_THEY_NOT_DONT, antipattern 1 contained: <marker><token>not</token></marker>

As a consequence, antipattern 3 did not give the expected results.

Please can we have a check in testrules to find a <marker> in an antipattern?

MikeUnwalla commented 3 years ago

I should have labelled the issue as a bug when I opened the issue. Sorry.

Remove the marker from the token in the first AP and the second AP works correctly.

<rule id="MARKER_IN_AP" name="Marker in AP Test">
    <antipattern><!-- Marker in this AP affects a different AP -->
        <token>blah</token>
        <token>and</token>
        <marker><token>myself</token></marker>
    </antipattern>
    <antipattern><!-- This AP does not work because of the marker in the previous AP -->
        <token>and</token>
        <token>myself</token>
        <token>included</token>
        <token>really</token>
    </antipattern>
    <pattern>
        <token>and</token>
        <marker>
            <token>myself</token>
        </marker>
    </pattern>
    <message>Marker in AP test.</message>
    <example correction="">The teacher asked Ben and <marker>myself</marker>.</example>
    <example>The teacher asked Ben and <marker>me</marker>.</example>
    <example type="triggers_error">Our service management folk and myself included really dislike these messages.</example>
</rule>
udomai commented 3 years ago

Let's try this ourselves:

Line 238 in src/main/resources/org/languagetool/rules/rules.xsd:

Remove the element <marker> from AP, see how many errors this causes. Before that, find out: How many APs actually have markers in them?

udomai commented 3 years ago

The main reason this isn't resolved yet is time. @MikeUnwalla, I was planning to think of a simple regexp search to find <marker> in <antipattern> to see how often it occurs in Open Source and Premium. If you happen to spare a "minute" for that, you could expedite the whole process substantially by doing that for me. Otherwise, I will do it as soon as I have the time.