invoice-x / invoice2data

Extract structured data from PDF invoices
MIT License
1.81k stars 477 forks source link

Gcloud is not working #180

Open suryacaprice opened 5 years ago

suryacaprice commented 5 years ago

Waiting for the operation to finish. Traceback (most recent call last): File "/home/caprice/anaconda3/bin/invoice2data", line 11, in sys.exit(main()) File "/home/caprice/anaconda3/lib/python3.6/site-packages/invoice2data/main.py", line 166, in main res = extract_data(f.name, templates=templates, input_module=input_module) File "/home/caprice/anaconda3/lib/python3.6/site-packages/invoice2data/main.py", line 90, in extract_data extracted_str = input_module.to_text(invoicefile).decode('utf-8') File "/home/caprice/anaconda3/lib/python3.6/site-packages/invoice2data/input/gvision.py", line 79, in to_text json_string = result_blob.download_as_string() AttributeError: 'NoneType' object has no attribute 'download_as_string'

m3nu commented 5 years ago

Google Vision needs a lot of setup. You need:

There are no instructions for this as of now, but it should be clear from the source code. Did you do all the setup tasks correctly before encountering this error?

suryacaprice commented 5 years ago

Hi , I have done all the configuration , Created bucket and the api is mapped to the project with the full access .

suryacaprice commented 5 years ago

I dont think gcloud config is the problem here json_string = result_blob.download_as_string() AttributeError: 'NoneType' object has no attribute 'download_as_string'

this line shows the error

m3nu commented 5 years ago

Then I'd check if you have a result in your bucket because this line just reads the result.

If your configuration is wrong there won't be a result in the bucket and this specific line will fail.

suryacaprice commented 5 years ago

Let me check the configuration again .

ananthnagan commented 5 years ago

does this pdf have multiple pages? if yes there is a problem in gvision.py where it is hardcoded to output1-1.json if pdf is more than one page gvision will create json file as output1-[no. of pages].json, so i have made a code change and it worked for me please find the code below

image

m3nu commented 5 years ago

so i have made a code change and it worked for me please find the code below

You should make a pull request for your fix. Else your improvement will never make it into the official repo and you need to maintain the change during every update.

ananthnagan commented 5 years ago

so i have made a code change and it worked for me please find the code below

You should make a pull request for your fix. Else your improvement will never make it into the official repo and you need to maintain the change during every update.

created the pull request

EtienneBerube commented 5 years ago

Hi, I am trying to make the Gvision work. I have my Google credential's json and am trying to figure out how to properly connect the bucket. I see that it is a default argument but none of the API calls refer to a bucket. Where can I specify my bucket? Thanks

Venerit commented 5 years ago

Hey guys, I was running in the same issue and tried ananthnagan's fix and it works for one multi page pdf but fails for another one. Can't really figure out what the issue would be. Any ideas?

Traceback (most recent call last): File "/usr/local/bin/invoice2data", line 10, in sys.exit(main()) File "/usr/local/lib/python3.6/dist-packages/invoice2data/main.py", line 201, in main res = extract_data(f.name, templates=templates, input_module=input_module) File "/usr/local/lib/python3.6/dist-packages/invoice2data/main.py", line 82, in extract_data extracted_str = input_module.to_text(invoicefile).decode('utf-8') File "/usr/local/lib/python3.6/dist-packages/invoice2data/input/gvision.py", line 35, in to_text result_blob_name = result_blob_basename + '/output-1-to-'+str(PdfFileReader(open(path, "rb")).getNumPages())+'.json' File "/usr/local/lib/python3.6/dist-packages/PyPDF2/pdf.py", line 1084, in init self.read(stream) File "/usr/local/lib/python3.6/dist-packages/PyPDF2/pdf.py", line 1697, in read line = self.readNextEndLine(stream) File "/usr/local/lib/python3.6/dist-packages/PyPDF2/pdf.py", line 1938, in readNextEndLine x = stream.read(1)

ananthnagan commented 5 years ago

Hi, I am trying to make the Gvision work. I have my Google credential's json and am trying to figure out how to properly connect the bucket. I see that it is a default argument but none of the API calls refer to a bucket. Where can I specify my bucket? Thanks

its at top of the gvision.py there you can give your bucket name image

m3nu commented 5 years ago

Right. When you integrate the lib in your own script, you can pass your bucket as optional keyword arg, as shown by @ananthnagan above.

EtienneBerube commented 5 years ago

This might work for a local solution, but if the code is in a docker which runs pip install the changes would be overridden. @ananthnagan seems to go and get the bucket from the environment variables, which would be a good alternative. Could a PR for this be justifiable?

EtienneBerube commented 5 years ago

a PR is created regarding @ananthnagan's fix https://github.com/invoice-x/invoice2data/pull/241

bosd commented 1 year ago

I'm starting to look into gvision. As there are no instructions, can someone point me which steps to take to make it work. @rmilecki have you looked into the gvision input module?

rmilecki commented 1 year ago

@bosd: I have zero experience with OCR inputs