Is your feature request related to a problem? Please describe.
In case of a contraction or agglutination, the control lists won't be used properly and the lemmas (and pos, but the number of possible combinations is much lower) will always be marked as unauthorized.
Example:
The form aunquel is the contraction of lemmas aunque and el. In our project it will be tagged as aunque+el. Even if both lemmas are in the list, an error will be raised, because aunque+el is not in the control list.
Describe the solution you'd like
As the delimiter for contractions is always the same, it should be possible for the engine to split the analysis using the delimiter. It would require for the user to add the delimiter information somewhere (in the control list panel I would say).
In the above example, aunque+el would be analyzed as two lemmas: aunque and el, each of them being compared to the control list. An error would be raised only if one of the lemmas are not in the list. A warning would tell the user that the analysis is wrong (and could indicate which lemma/POS is not in the control list)
Is your feature request related to a problem? Please describe. In case of a contraction or agglutination, the control lists won't be used properly and the lemmas (and pos, but the number of possible combinations is much lower) will always be marked as unauthorized.
Example:
The form
aunquel
is the contraction of lemmasaunque
andel
. In our project it will be tagged asaunque+el
. Even if both lemmas are in the list, an error will be raised, becauseaunque+el
is not in the control list.Describe the solution you'd like As the delimiter for contractions is always the same, it should be possible for the engine to split the analysis using the delimiter. It would require for the user to add the delimiter information somewhere (in the control list panel I would say).
In the above example,
aunque+el
would be analyzed as two lemmas:aunque
andel
, each of them being compared to the control list. An error would be raised only if one of the lemmas are not in the list. A warning would tell the user that the analysis is wrong (and could indicate which lemma/POS is not in the control list)