languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.21k stars 1.38k forks source link

return the source token matches in the suggestion for bitext rules #221

Open JoergBeisiegel opened 9 years ago

JoergBeisiegel commented 9 years ago

Hello Marcin, Is it possible for bitext rules to return the source token matches in the suggestion? I am checking translations that contain UI translations, followed by the original GUI string in parenthesis. Rule writing would be easier and more powerful to directly use the matched source in the target. This specially applies to deviations in capitalization. Another advantage: Corrections, Messages and Examples could be written in a more dynamic way.

Thanks a lot for your great work!

Best, Jörg

milekpl commented 9 years ago

Could you give just a simple example for me to play with? I know what you mean but cannot find a good example for a correction.

JoergBeisiegel commented 9 years ago

Hello Marcin, here is the example: I have tagged source strings. The source string has one or more string tokens and ends with a number token. Example {1}EN string 1{2} Translation should be: {1}De string 1 (EN string 1){2} I can now start a query that matches all EN sentences with the given pattern that have the German pattern with number mismatch. So far so good. But at the moment I cannot tell the user if the English or the German number is wrong. I can only tell, that there is a mismatch. In order to tell this I would need the EN number match.

Here is my rule, sorry it might not be very elegant, but I am only starting with LanguageTool:

            <rule lang="de" id="NON_MATCHING_ITEM_ID" name="ID stimmt nicht überein" type="numbers">
        <pattern case_sensitive='no'>
            <source lang="en">
                <token regexp='no'>{</token>
                <token regexp='yes'>\p{N}+</token>
                <token regexp='no'>}</token>
                <token skip="-1">
                    <exception regexp="no" scope="next">\p{N}+</exception>
                </token>
                <token regexp='yes'>\p{N}+</token>
                <token regexp='no'>{</token>
                <token regexp='yes'>\p{N}+</token>
                <token regexp='no'>}</token>
            </source>
            <target>
                <marker>
                <token regexp='no'>{</token>
                <token regexp='yes'>\p{N}+</token>
                <token regexp='no'>}</token>
                <token skip="-1">
                    <exception regexp="no" scope="next">\p{N}+</exception>
                </token>
                <token regexp='yes'>\p{N}+</token>
                <token regexp='no'>(</token>
                <token skip="-1">
                    <exception regexp="no" scope="next">\p{N}+</exception>
                </token>
                    <token regexp='yes'>(?!^<match no="4"/>$)\p{N}+</token>
                <token regexp='no'>)</token>
                <token regexp='no'>{</token>
                <token regexp='yes'>\p{N}+</token>
                <token regexp='no'>}</token>
                </marker>
            </target>
        </pattern>
        <message>ID stimmt nicht überein: '<match include_skipped="none" no="5"/>' und '<match include_skipped="none" no="8"/>' gefunden. <suggestion><match include_skipped="none" no="8"/></suggestion> ändern in <suggestion><match include_skipped="none" no="5"/></suggestion></message>
        <example type="incorrect">
            <srcExample>Select <marker>{4217}Item 2{4218}</marker> from the model tree.</srcExample>
            <trgExample correction="\5">Wählen Sie <marker>{4217}Element 3 (Item 2){4218}</marker> aus.</trgExample>
        </example>
        <example type="correct">
            <srcExample>Select <marker>{4217}Item 2{4218}</marker>.</srcExample>
            <trgExample correction="\5">Wählen Sie <marker>{4217}Element 2 (Item 2){4218}</marker>.</trgExample>
        </example>
    </rule>

Don't hesitate to contact me if you need more information.

Best, Jörg