Open PanosHatz opened 6 months ago
Change two lines (258,259) in setup.py:
install_requires=[ "tensorflow", "numpy", "six~=1.15.0", "datefinder==0.7.1", "opencv-python==4.5.1.48", "pdf2image==1.14.0", "pdfplumber==0.5.27", "PyPDF2==1.27.9", "pytesseract==0.3.7", "python-dateutil==2.8.1", "PyYAML==5.4.1", "simplejson==3.17.2", "tqdm==4.59.0", "google-api-python-client", "google-cloud-vision" ])
Change two lines (258,259) in setup.py:
install_requires=[ "tensorflow", "numpy", "six~=1.15.0", "datefinder==0.7.1", "opencv-python==4.5.1.48", "pdf2image==1.14.0", "pdfplumber==0.5.27", "PyPDF2==1.27.9", "pytesseract==0.3.7", "python-dateutil==2.8.1", "PyYAML==5.4.1", "simplejson==3.17.2", "tqdm==4.59.0", "google-api-python-client", "google-cloud-vision" ])
Thank you very much, it worked!
Have you implement this repo successfully in windows
Yes. On Win 10 with miniconda.
Yes. On Win 10 with miniconda.
I ran into some other problems and kind of gave up. Any idea if it works for Windows 11?
Please tell us what problems or errors you have.
Please tell us what problems or errors you have.
Thanks a lot for the immediate response. Actually, I think I managed to make it work after a fresh "reinstall" Just two questions: Can I train using a regular CPU? If my invoices are in Greek Language will it work?
You can easily train the network using only the CPU. The tensorflow library will detect what it can run on.
As for the language, by default ORC tesseract has English enabled. The program must force the language to be Greek or English+Greek. https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html
File InvoiceNet\invoicenet\common\util.py, line 95.
data = pytesseract.image_to_data(img, output_type=Output.DICT)
data = pytesseract.image_to_data(img, lang='grc', output_type=Output.DICT)
You need to check what languages tesseract-ocr supports:
c:\Program Files\Tesseract-OCR\tesseract.exe --list-langs
You can easily train the network using only the CPU. The tensorflow library will detect what it can run on.
As for the language, by default ORC tesseract has English enabled. The program must force the language to be Greek or English+Greek. https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html
File InvoiceNet\invoicenet\common\util.py, line 95.
data = pytesseract.image_to_data(img, output_type=Output.DICT)
data = pytesseract.image_to_data(img, lang='grc', output_type=Output.DICT)
Hi, I tried training using only CPU, it took a huge amount of time. Can I somehow use Google Colab's free GPUs for this? Do I have to make any modification to the code?
On a normal computer, 5,000 invoices are processed and trained in about a few hours. It's enough once. Then the trained network works quickly.
The only thing I see in the Google OCR code is the util.py file line 37:
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="google_api_keys.json"
Hi, first of all is this project still active?
When trying to install on Windows 11 Anaconda after the pip install . command I get the following error:
Can anyone help me?