smalot / pdfparser

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
GNU Lesser General Public License v3.0
2.3k stars 534 forks source link

does your package supports Arabic and Persian language? #651

Open mdoulabi1 opened 8 months ago

mdoulabi1 commented 8 months ago

does your package supports Arabic and Persian language? i test the languages but it show the sentense incorrect plz help thanks

k00ni commented 8 months ago

You have to be more specific, if you want help.

Which version did you try: latest master-branch or a certain version?

i test the languages but it show the sentense incorrect

Please provide example code (with actual Arabic/Persian language strings) or a PDF, which leads to incorrect output. Be more specific about the term "incorrect". What do you expect and what is the actual output.

mdoulabi1 commented 8 months ago

when i wanna read a presain/arabic language the word start from end for example: correct text: سلام من یک برنامه نویس هستم incorrect text:مالس نم همانرب سیون متسه in english imagine that you wanna read hello but you get olleh

k00ni commented 8 months ago

I believe there are already issues about this topic. For example, https://github.com/smalot/pdfparser/issues/316. Summary: PDFParser is currently not able to parse languages properly, which are read from right to left. @GreyWyvern gave a good overview here: https://github.com/smalot/pdfparser/issues/316#issuecomment-1686461583

So my answer to your question is: No, it doesn't support these types of language. Although, is it practical to you if you just reverse the output again to gain the correct order of symbols?

mdoulabi1 commented 8 months ago

I believe there are already issues about this topic. For example, #316. Summary: PDFParser is currently not able to parse languages properly, which are read from right to left. @GreyWyvern gave a good overview here: #316 (comment)

So my answer to your question is: No, it doesn't support these types of language. Although, is it practical to you if you just reverse the output again to gain the correct order of symbols?

i test it to reverse the word but it does not work and has a lot of challange