languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.44k stars 1.4k forks source link

[en] Disambiguation Error: UNKNOWN_PCT[1]: ,[,/,*,O] -> ,[,/,*,,/PCT*,O] #2474

Open ArronWilliams opened 4 years ago

ArronWilliams commented 4 years ago

The issue for this is larger, it's either to do with the PCT or paragraph end.

In this rule:

<rule id="CONFUSION_OF_BEEN_TO_ITALY" name="She has been (to) Italy.">
    <pattern>
        <token inflected="yes">has</token>
        <token regexp="yes">been|gone</token>
        <marker><token postag="NNP"></token></marker>
        <token chunk="E-NP-singular"></token>
    </pattern>
    <message>When you refer to a place with 'been' or 'gone' the word 'to' is required before the noun. Such as: <suggestion>to <match no="3" /></suggestion>?</message>
    <example correction="to Italy">Despite all the allegations, She has been <marker>Italy</marker></example>
</rule>

It seems like it should work but it gives an error and these as the messages:

PDT_PDT[1]: all[all/PRP,all/DT,all/JJ,all/NN:U,all/PDT,B-NP-plural] -> all[all/PDT,B-NP-plural] UNKNOWN_PCT[1]: ,[,/,,O] -> ,[,/,,,/PCT*,O] add_paragaph_end: Italy[Italy/NNP,Italy/SENT_END,B-NP-singular|E-NP-singular] -> Italy[Italy/NNP,Italy/SENT_END,Italy/PARA_END,B-NP-singular|E-NP-singular]

MikeUnwalla commented 4 years ago

@ArronWilliams ,

In Tagger Result, I see these postags for 'Italy': Italy[Italy/NNP,</S>,B-NP-singular|E-NP-singular]

Your rule has a token for 'Italy' and a separate token with E-NP-singular for the next word. Try with the POS NNP and the chunk in the same token.

ArronWilliams commented 4 years ago

@MikeUnwalla I have tried it and it still gives the same error and messages as above.

MikeUnwalla commented 4 years ago

For an inflected token, you must use the base form of the verb: <token inflected="yes">have</token>

Also, there is a bug that means the suggestion for 'to' will be be 'To'. In the rule below, I added a work-around, as documented on http://wiki.languagetool.org/tips-and-tricks#toc15


<rule id="CONFUSION_OF_BEEN_TO_ITALY" name="She has been (to) Italy.">
    <pattern>
        <token inflected="yes">have</token>
        <token regexp="yes">been|gone</token>
        <marker>
            <token chunk="E-NP-singular" postag="NNP"/>
        </marker>
    </pattern>
    <message>When you refer to a place with 'been' or 'gone' the word 'to' is required before the noun. Such as: <suggestion><match no="1" regexp_match="h.*" regexp_replace="to" case_conversion="alllower"/> <match no="3" /></suggestion>.</message>
    <example correction="to Italy">She has been <marker>Italy</marker>.</example>
</rule>

@danielnaber , the documentation for the work-around is not fully correct. With this code: <match no="2" regexp_match=".*" regexp_replace="to" case_conversion="alllower"/>

testrules gives this warning: Exception in thread "main" java.lang.AssertionError: English: Incorrect suggestions: [to Italy] != [toto Italy] for rule CONFUSION_OF_BEEN_TO_ITALY[1] on input: She has been Italy. expected:<[to Italy]> but was:<[toto Italy]>