PDF reader: RTL text copying/selection/highlighting is handled as LTR text and gets reversed

zotero / zotero

Zotero is a free, easy-to-use tool to help you collect, organize, annotate, cite, and share your research sources.

https://www.zotero.org

Other

10.46k stars 765 forks source link

PDF reader: RTL text copying/selection/highlighting is handled as LTR text and gets reversed #2509

Open mrtcode opened 2 years ago

mrtcode commented 2 years ago

https://forums.zotero.org/discussion/96022/arabic-script-and-searchable-pdfs-in-beta-6-0-5 https://forums.zotero.org/discussion/98567/rtl-language-highlights-not-working-correctly

mrtcode commented 2 years ago

Although this doesn't work well on PDF.js either (unpredictable selection, although text is not reversed), therefore fixing this might be challenging.

helsingi commented 9 months ago

Hi, I am working with a lot of Arabic text and notice regularly that the pdf reader in zotero recognizes the OCRed arabic text in reverse, i.e., LTR instead of RTL. Is there anything I could do or are you working on a fix? Thanks.

mrtcode commented 9 months ago

@helsingi Could you upload example PDF files here? Single page per PDF file is enough.

helsingi commented 9 months ago

@helsingi Could you upload example PDF files here? Single page per PDF file is enough.

abbas 2016 (copy).pdf ghunaym 2013 (copy).pdf hussein 1973 (copy).pdf sharaf 1992 (copy).pdf

p.s. I am using Linux Mint 21.2 and Zotero 6.0.30. For OCR I used OCRmyPDF. The RTL recognition works well in some apps such as when I preview the pdf in Document Viewer or Chromium, but does not work well in Zotero and Okular.

mrtcode commented 9 months ago

@helsingi, thank you for the examples. However, it seems they work better on Zotero 7 beta, doesn't it?

helsingi commented 9 months ago

@helsingi, thank you for the examples. However, it seems they work better on Zotero 7 beta, doesn't it?

Not really. Just installed zotero 7 on windows and tried searching (ctrl+f) within a pdf attachment, unfortunately the same issue persists. Capture1 In the screenshot you can see that the word I used is the reverse (LTR) of the actual word. When I search for the correct word no results show.

mrtcode commented 9 months ago

@helsingi Ok, at least with those PDF files it actually doesn't work well. Reopening the issue.