saucepleez / taskt

taskt (pronounced 'tasked' and formely sharpRPA) is free and open-source robotic process automation (rpa) built in C# powered by the .NET Framework
http://www.taskt.net/
1.11k stars 355 forks source link

error reading scanned pdf #125

Closed mervesever closed 5 years ago

mervesever commented 5 years ago

Hello! How can I read "Scraned" pdf with Pdf Extractoin?

saucepleez commented 5 years ago

@merveAltili You cannot read a scanned PDF as the PDF parser is looking for electronic data. You can try to do OCR on the image text, otherwise, you will probably need to look at using cloud services.

We can add a specific command for "Document Analysis" if required - however, you will need to select a cloud provider and potentially incur a charge, ex. https://aws.amazon.com/textract/

mervesever commented 5 years ago

Thank you so much. So, if you have a sample .xml about the database command, would you share it?

mervesever commented 5 years ago

I've done it with no need for an example :) Thank you very much I wish you good work.