smalot / pdfparser

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
GNU Lesser General Public License v3.0
2.35k stars 536 forks source link

Hebrew text #40

Open aaacckiy opened 10 years ago

aaacckiy commented 10 years ago

Hi. I find problem with parse Hebrew text. Characters are recognized, but the meaning is lost. We found that words and letters in the words standing in the reverse order. Check it please!

kobyssh commented 9 years ago

Hi, I am sitting here for hours but can't get Hebrew pdf file parsed with hebrew characters. I've tried setting the document encoding to utf-8 / windows-1255, and tried converting the output with iconv(..) etc. but still no hebrew. can you share a working example? Thanks

k00ni commented 3 years ago

Hey, if you are still using this library it would be great if you could try again with our latest release and get back to us. Thank you.

Generation4 commented 1 year ago

Hi, I just tried the new version with Hebrew and the letters is still in reverse order. it looks like the problem related to the fact that it's an RTL language and the parser probably goes from left to right. If so, the solution could be to flip the direction of the parser. Please check if this can be a solution 🙏