sohaibafifi / languagetool

A fork of languagetool to maintain Arabic
https://languagetool.org
GNU Lesser General Public License v2.1
18 stars 1 forks source link

Suggestion match with regexp_match not working when using tashkeel #44

Open sohaibafifi opened 4 years ago

sohaibafifi commented 4 years ago

Example :

<rule id="word_use_0005_muta2akid" name="متأكد">
        <pattern>
          <token inflected="yes">متأكد</token>
        </pattern>
        <message>يفضل أن يقال:
        <suggestion><match no="1" regexp_match="متأكد" regexp_replace="متحقِّق"/></suggestion>
        <suggestion><match no="1" regexp_match="متأكد" regexp_replace="متيقِّن"/></suggestion>
متيقن أو متحقق بدلا من متأكد</message>
        <example correction="متحقِّق|متيقِّن" type="incorrect"> هل أنت <marker>متأكد</marker>؟</example>
        <!--  Wrong: هل أنتَ متأكِّد؟ -->
        <!--Correct: هل أنتَ متيقِّن؟ / هل أنتَ متحقق؟ -->
</rule>

with the sentence :

هل أنتَ مُتأكِّد أنّنا نسير في الاتّجاه الصّحيح؟

output:

1.) Line 1, column 9, Rule ID: word_use_0005_muta2akid[4]
Message: يفضل أن يقال:
        'مُتأكِّد'
        'مُتأكِّد'
متيقن أو متحقق بدلا من متأكد
Suggestion: مُتأكِّد
Rule source: /org/languagetool/rules/ar/grammar.xml
هل أنتَ مُتأكِّد أنّنا نسير في الاتّجاه الصّحيح؟
        ^^^^^^^^                                

The problem: regexp_match is not matching if the word contains tashkeel.

linuxscout commented 3 years ago

I suggest to re- program the "case_conversion" attribute. to handle tashkeel strip or ignoring

https://dev.languagetool.org/tips-and-tricks#changing-the-case-of-matched-word

linuxscout commented 3 years ago

I found a way to do this, I make some changes on code, can you update the repository from upstream, in order to make a PR for this change thanks

linuxscout commented 3 years ago

The commit,

https://github.com/linuxscout/languagetool/commit/8d0f2ea46a83333c478d6b7be12c2c2cf3812949

sohaibafifi commented 3 years ago

@linuxscout My repo is synched now with upstream

linuxscout commented 3 years ago

To be closed

sohaibafifi commented 3 years ago

Should I include the removeTashkeel method?

linuxscout commented 3 years ago

I tried to add it, take a look on the PR, I updated core-files. Perhaps there is a way to includes changes only on arabic module

linuxscout commented 2 years ago

To be closed