Open mkhashoggi opened 3 years ago
This is important for Hebrew as well.
Hi @mkhashoggi, thanks for the suggestion and the PR. I need some time to familarize myself with the logic and rules of RTL languanges to be able to review the PR.
I did some reading about bi-directional text to understand the changes that we need to make. Please correct me if I'm wrong. I've no experience with right-to-left text, so I might have an overly left-to-right inclined way of thinking.
To summarize:
So what we need is a interpreter for text lines (i.e. LTTextLine
) that detects which part of the text is left-to-right and which is right-to-left and add the necessary unicode characters if necessary. This is precisely what PR #516 is about.
I understand that there is a PR #516 in progress regarding this support request. However, I was confused if this feature is currently usable or not. I wondered if there is an example that shows how to use this feature if applicable?
Hi,
If this is implemented what is the hold up here? Can anyone update?
Anyone willing to work on #516? It implements this feature but it has been inactive for a while. And it does need extra work.
Hey, when it will be ready? right now when i try to read from a pdf, the hebrew characters are missing
No one is working on this currently
I recently needed this and tried to hack around until I got something that worked for me Maybe its useful for someone https://pypi.org/project/pdfminer.rtl
Feature request
A description of the feature you would like to have Currently PDFMiner does not supports RTL languages such as Arabic. This is the output of the current version of PDFMiner: PDF File:
Output:
If relevant, the context that you are in. What are you trying to achieve? What is needed is to apply Unicode bi-directional rules to allow reordering characters in bi-directional text. http://unicode.org/reports/tr9/#Explicit_Levels_and_Directions
If possible, an example of what you want to achieve. Include the PDF that you are working on. Include the output that you would like to have. The desired output should look like this: