pelagios / recogito2

Semantic Annotation Without the Pointy Brackets
Apache License 2.0
153 stars 30 forks source link

"Re-Apply" option for Arabic #663

Closed obtic-sorbonne closed 4 years ago

obtic-sorbonne commented 4 years ago

When I use:

Re-Apply / Show advanced options / require full word march (surrounded by whitespace, comma, etc.)

It works well for French but not for Arabic (RTL) because it annotates any string match, even inside words. Thanks for your help.

rsimon commented 4 years ago

Thanks for reporting. Can you provide a suitable example I can use to reproduce the problem? Ideally, a .txt file and a word or phrase suggestion?

obtic-sorbonne commented 4 years ago

Please find attached a sample file (or here) from online news. You can annotate the word فيروس (virus). The system will suggest you:

There are 3 more un-annotated occurrences of فيروس in the text. Do you want to re-apply this annotation?

Actually, the 3 others occurences are agglutinated to prefixes: الفيروس الفيروس بالفيروس

Thanks for your help! ps. you can find the same example here: corona news.txt

rsimon commented 4 years ago

Hi,

just a quick heads that I finally got a chance to look into this (and an apology for this taking so long...) I think I can publish a fix later today. Will let you know.

Cheers, R

rsimon commented 4 years ago

The fix is now applied to production. Do let me know if the problem persists for you. Otherwise: happy annotating ;-)

obtic-sorbonne commented 4 years ago

Thank you very much, it works well !