languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
11.81k stars 1.38k forks source link

Advanced synthesizer: another feature #4325

Open jaumeortola opened 3 years ago

jaumeortola commented 3 years ago

See a description of the AdvancedSynthesizerFitler here: https://dev.languagetool.org/using-asf

We need to create a new POS tag from parts of two different tokens. This is my proposal to do it.

Here a verb (in token 2) will be synthesized using person (1-3) and number (sing/pl) from the pronoun/subject (in token 1).

<filter class="org.languagetool.rules.ca.AdvancedSynthesizerFilter" args="lemmaFrom:3 lemmaSelect:V(...)..(..) postagFrom:2 postagSelect:PP(.).(.)... postagReplace:V\a1\b1\b2\a2"/>

\a1 means first parenthesis from lemmaSelect, \a2 second parenthesis from lemmaSelect, \b1 first parenthesis from postagSelect, \b2 second parenthesis from postagSelect.

This seems general enough. The limitation is that we can take information only from two tokens.

If this approach is useful, these attributes could be added to the regular <match> element..

udomai commented 3 years ago

This looks very nice to me. We should absolutely write a little entry in dev.languagetool.org if we adopt it, with an example. But the principle is genius.

jaumeortola commented 3 years ago

It is already implemented.

Tell me some rules (in French?) where you wanted to apply this feature, and I will test them. @udomai

udomai commented 3 years ago

We could start with a simple DN_V rule for easy cases (starting with SENT_START):

L'homme trouvent le chat. → trouve