tebelorg / RPA-Python

Python package for doing RPA
Apache License 2.0
4.84k stars 663 forks source link

OCR of scanned invoice PDF format or invoice JPEG/PNG format is possible? - yes #228

Closed emshihab008 closed 3 years ago

emshihab008 commented 3 years ago

Hi. I would like to inform you that I need to process Scanned invoice PDF format or invoice JPEG/PNG format. In this regard, can I use any RPA python tool to extract the necessary information?

Regards, Ekram

fi-do commented 3 years ago

Hola emshihab008,

I would suggest to transform your pdfs in pngs to extract via r.read() the content.

Hope it will help you, fi-do

emshihab008 commented 3 years ago

Hi @fi-do According to your suggestion, I ran the code. But I didn't get the proper result. Is it possible with RPA Python? or I have to use python OCR module (opencv) for the accurate result. And is there any possibility to snap specific region of png file for OCR ?

import rpa as r r.init(True, True) r.wait(2)

online pdf to png convert

r.url("https://pdf2png.com/de/") r.click("choose.png") r.dclick("b.png") r.wait(10) r.click("all_files.png") r.rclick("zip.png") r.click('extract.png') r.run('C:\Users\shihab\Downloads\pdf2png\books') print(r.read(100,87,889,702)) r.close()

Output:

C:\Python\Python39\python.exe C:/PzcharmProjects/Practice.py -y Paste T ——.-. F“. Move Copy Del EL ‘lwht J» 619.609 uon. E] Pam mom“ t°' t°' ' E' “‘1‘ - j w 15 ist, wcnn das Geld nur fiir cincn Compute fi.’ Clipboard Organize " l “'1‘ ‘ , f' fu dzw ‘ Sch" . r t . ”W Russo \on W n anz1g ulem rucht? man

books-1 books.pdf

kensoh commented 3 years ago

Hi Ekram,

I believe RPA for Python's OCR (Tesseract) is similar to Python OCR since Tesseract is the industry standard. However, your scanned image is not a properly scanned image, it has deformation due to not being flat-scanned. Even using commercial paid RPA tools it is unlikely to get good accuracy.

Even if you scan it perfectly, to get the best results, I recommend using some 3rd party apps in below post - https://www.linkedin.com/posts/kensoh_rpa-tagui-activity-6780711825156775938-Wyt4

These companies spent all their resources getting it right to convert scanned PDFs into digital form. Their app takes care of pre-processing like aligning orientation, color correction etc. So it is likely you get the best results from there (almost 200 languages supported).

To snap specific region, you can consider taking a snapshot of your PDF viewer, use an image editor to remove all dynamic content to make it transparent. Then save as .png. When you do r.read('frame.png'), OCR will be performed on their PDF viewer for you to do further processing. For eg, for macOS PDF view, you can do something like below (try downloading the image and you can see many parts are made transparent) -

frame

emshihab008 commented 3 years ago

Hello Kensoh, Thank your for information. I know this document is quite unorganized. That's why I tried OCR using this document. If it is possible with that document, then every document can be processed. And thank you for your solution regarding snapping. I have another question. Can I read document using r.read('apple.png')? or I have to provide coordinate?

import rpa as r r.init(True, False) k=r.read('apple.png') Print(k) r.close()

Output:

[RPA][ERROR] - TagUI process ended unexpectedly [RPA][ERROR] - use init() before using send()

kensoh commented 3 years ago

I see.. Yes that makes sense your thinking. For your error I have not come across this before or raised by other users.

Can you attach the log file for SikuliX here? From the installation location below, there is TagUI folder, then src\tagui.sikuli*.log

For Windows, location of installation is %APPDATA%\tagui (%APPDATA% is usually C:\Users\Username\Appdata\Roaming)

kensoh commented 3 years ago

It is strange because your previous example using coordinates can capture but using image fails.

Other than the logs, can you also attach the apple.png file here? I'm assuming the apple.png is in the same folder as where you run the python script.

emshihab008 commented 3 years ago

Hi Kensoh, I ran the code today again and got the below output.

[RPA][ERROR] - cannot find apple.png Process finished with exit code 0

I have attached log file and original apple png file. Please find.

tagui.log

apple

kensoh commented 3 years ago

Hi Ekram,

When I run read() on an image that can't be found on the screen, I get similar errors too in the log like below -

[tagui] START  - listening for inputs

[tagui] INPUT  - [1] present /Users/kensoh/Desktop/apple.png
[tagui] ACTION - present /Users/kensoh/Desktop/apple.png
[tagui] OUTPUT - [1] ERROR

[tagui] INPUT  - [2] present /Users/kensoh/Desktop/apple.png
[tagui] ACTION - present /Users/kensoh/Desktop/apple.png
[tagui] OUTPUT - [2] ERROR

[tagui] INPUT  - [3] present /Users/kensoh/Desktop/apple.png
[tagui] ACTION - present /Users/kensoh/Desktop/apple.png
[tagui] OUTPUT - [3] ERROR

[tagui] INPUT  - [4] present /Users/kensoh/Desktop/apple.png
[tagui] ACTION - present /Users/kensoh/Desktop/apple.png
[tagui] OUTPUT - [4] ERROR

That is correct and expected. But in your case you are saying that the Python program crashes is it?

Can you create the following file and put ocr.tag and apple.png in ~/.tagui/src folder?

ocr.tag

read apple.png to ocr_text
echo `ocr_text`

Then try running the following to run the RPA using TagUI directly.

cd ~/.tagui/src
./tagui ocr.tag

Above steps will help to isolate the problem whether program is crashing at TagUI / SikuliX or something wrong in Python.

emshihab008 commented 3 years ago

Hi Ken, Initially, when I ran the code it was showing the following output: [RPA][ERROR] - TagUI process ended unexpectedly

But last time, when i ran again, it showed: [RPA][ERROR] - cannot find apple.png

According to your instruction, i created ocr.tag file and copy the code and put it recommended folder. ocr.tag.txt taguiocr

But I am confused about last line (cd ~/.tagui/src ./tagui ocr.tag). How should I write?

kensoh commented 3 years ago

Oh sorry I see you are on Windows. You can do the following from command prompt -

cd c:\Users\shihab\AppData\Roaming\tagui\src
tagui ocr.tag

If the same error happens, then issue is on TagUI side. See if you can join the weekly call at Thursday 4-5pm (UTC+8), that should be 10-11am time in Germany. Another possible issue is when the image resolution for Retina displays. Let me search that issue solution and paste here.

kensoh commented 3 years ago

See these 2 issues related to Retina display. The gist is that the image captured actually is a different density and much larger / smaller than actual screen. So users do some settings change to make capturing image work correctly.

https://github.com/kelaberetiv/TagUI/issues/896 https://github.com/kelaberetiv/TagUI/issues/818

emshihab008 commented 3 years ago

Hi Ken, Using command prompt, I tried to open ocr.tag. Bu it is showing cannot fine ocr.tag. command prompt

And i have also checked the following issue. kelaberetiv/TagUI#896 kelaberetiv/TagUI#818 But these are all about retina display isuue. I am using windows, ASUS Laptop, LED display, pixel size (1366& * 768).

kensoh commented 3 years ago

I see.. This is a really strange issue. There doesn't seem to be anything else we can check, other than meeting on a Zoom call so that I can directly test on your computer. Will coming Thursday 10am Germany time be good for you to meet?

Since this issue is an upstream TagUI issue, we can meet on the weekly Zoom call to troubleshoot.

URL for the Zoom meeting is https://github.com/kelaberetiv/TagUI/issues/914

By the way, just to be sure, on your screen there is really the image apple.png showing but it does not detect correct?

emshihab008 commented 3 years ago

Hi Ken, Yes. I can attend the zoom meeting on Thursday 10 am. yes. it cannot find the png file.

kensoh commented 3 years ago

Ok Ekram, see you then!

kensoh commented 3 years ago

Hi Ekram, sorry for dropping from the call earlier due to urgent company meeting.

Can you try the following?

Use the snipping tool to take a screenshot of your start button at bottom left of your screen. Save it as start.png

After that, in your script, see if you can do r.click('start.png') on the start button. If yes then the computer vision is ok.

And the next step will be you are trying to read using OCR of an image on the screen, how the read() works is you must really have it on the screen. It is not opening a file and performing OCR on it in the background. For that, you can use other tools on Python or 3rd-party tools.

For RPA for Python, and TagUI, the read() to capture OCR is only for text that is really now visible on the screen. It usually works by you providing a window frame of the PDF viewer but with the contents deleted to be transparent and save as a GIF. For example, what I share above - https://github.com/tebelorg/RPA-Python/issues/228#issuecomment-810551911

emshihab008 commented 3 years ago

Hi Ken, I am sorry for late reply. I was a little bit busy. I could not concentrate on my project. According to your suggestion, I tested visual automation. It works perfectly. Currently, I am using picasa photo viewer. If i use another photo viewer, is there any possibility to work?

kensoh commented 3 years ago

Hi Ekram, there are 2 things to check -

  1. make sure the PDF viewer window is really on top and visible when you are doing the read step
  2. if you are using macOS and have high res Retina display, check this issue with the screenshot tool
emshihab008 commented 3 years ago

Hi Ken,

  1. I am using windows.
  2. I have tried to read documents in 4 different ways. I have got the following output. I have attached both image and pdf documents.

1. Tried to read image documents:

import rpa as r
r.init(True, False)
r.wait(2)
print(r.read('apple.png'))
r.close()

Output: C:\Python\Python39\python.exe C:/PzcharmProjects/Practice.py

3951!?

Process finished with exit code 0

import rpa as r
r.init(True, False)
r.wait(2)
r.dclick('apple.png')
r.wait(2)
print(r.read('apple.png'))
r.close()

output: C:\Python\Python39\python.exe C:/PzcharmProjects/Practice.py g

Process finished with exit code 0

2. Tried to read pdf document

import rpa as r
r.init(True, False)
r.wait(2)
print(r.read('png2pdf.png'))
r.close()

output: C:\Python\Python39\python.exe C:/PzcharmProjects/Practice.py

Process finished with exit code 0

import rpa as r
r.init(True, False)
r.wait(2)
r.dclick('png2pdf.png')
r.wait(2)
print(r.read('png2pdf.png'))
r.close()

output: C:\Python\Python39\python.exe C:/PzcharmProjects/Practice.py [RPA][ERROR] - cannot find png2pdf.png

Process finished with exit code 0

apple png2pdf.pdf

kensoh commented 3 years ago

To use read() to perform PDF to OCR, the PDF file must be opened already. And after opening, the input to read() should be same image resolution as the PDF viewer on the screen. For Retina display the snapshot tool will need manual downsizing.

What I'll recommend next is you try other tools like below, which can convert scanned PDF to digitalised PDF. There is free version with watermark but it should hopefully work for your use case.

https://www.tracker-software.com/product/pdf-xchange-editor

After you convert your PDFs into digital PDFs, then you can run RPA for Python to do other next steps. Or if converting to digital PDF is the goal, then perhaps no need to use RPA tools at all. Some of these PDF tools have functionality to auto-convert new files added to a folder.

kensoh commented 3 years ago

Let me know how it goes!

emshihab008 commented 3 years ago

Thank you for your suggestion. I have changed scanned pdf to editable pdf using pdf exchange editor. Still I am facing the same problem. I have attached the editable pdf so that you can also try. But if i use coordinate, then it works. png2pdf.pdf

import rpa as r
r.init(True, False)
r.wait(2)
r.dclick('png2pdf.png')
r.wait(2)
**print(r.read('png2pdf.png'))**
r.close()

png2pdf.pdf

**output:**
C:\Python\Python39\python.exe C:/PzcharmProjects/Practice.py
[RPA][ERROR] - cannot find png2pdf.png

Process finished with exit code 0
import rpa as r
r.init(True, False)
r.wait(2)
r.dclick('png2pdf.png')
r.wait(2)
**print(r.read(42,209,1082,644))**
r.close()
**output:**
C:\Python\Python39\python.exe C:/PzcharmProjects/Practice.py
The ship drew on and had safely passed the strait, which some volcanic shock
has made between the Calasareigne and Jams islands; had doubled Pomégue,
and approached the harbor under topsails, jib, and spanker, but so slowly and
sedately that the idlers, with that instinct which is the forerunner of evil, asked
one another what misfortune could have happened on board. However, those
experienced in navigation saw plainly that if any accident had occurred, it was
not to the vessel herself, for she bore down with all the evidence of being
skilfully handled, the anchor a-cockbill, the jib-boom guys already eased off, and
standing by the side of the pilot, who was steering the Pharaon towards the
narrow entrance of the inner port, was a young man, who, with activity and
vigilant eye, watched every motion of the ship, and repeated each direction of
the pilot.

Process finished with exit code 0
kensoh commented 3 years ago

Next time see if you can join the weekly Zoom call again to look at it together. I don't think the following lines make any sense. Double-clicking on an icon image is to open the PDF file. But doing OCR by looking for the PDF icon is strange.

r.dclick('png2pdf.png')
r.wait(2)
**print(r.read('png2pdf.png'))**

But since you already have the editable PDF, what you need to do is automate opening the PDF, then do the following

r.dclick('pdficon.png')
r.wait()
r.keyboard('[ctrl]a')
r.keyboard('[ctrl]c')
pdf_text = r.clipboard()
emshihab008 commented 3 years ago

Yeah sure. I will attend. I have also a same question regarding double clicking.

And regarding editable PDF, you are right . But that was non editable pdf. I made it editable using OCR tool of pdf exchange editor.