pdfminer / pdfminer.six

Community maintained fork of pdfminer - we fathom PDF
https://pdfminersix.readthedocs.io
MIT License
5.95k stars 930 forks source link

pdfminer can't extract text from some pdffiles but pypdf can? #841

Open ramtalentrecruit opened 1 year ago

ramtalentrecruit commented 1 year ago

Feature request

Thanks for your suggestion on improving pdfminer.six. To helps us discuss and implement this request, please make sure to include the following information:

vilabho commented 1 year ago

Could you provide these pdf files here? also did those pdfs had only images and no text..? If so, then how did you imply that OCR was not used and still text got extracted?

mrm202 commented 1 year ago

Thanks for your response. I told you pypdf extracted text from those files, these files contain images+text. Task is to extract text not mages. I can't provide those files here but will be very happy to share in mail. You can send email here

vilabho commented 1 year ago

I have sent an email, kindly share your files there

mrm202 commented 1 year ago

I didn't get your email id. Can you send again please at this email id? mularamiit@gmail.com

vilabho commented 1 year ago

I have sent the reply again on the mailid mentioned above. Please check in Spam/Junk folder of your inbox as well.