watson-developer-cloud / python-sdk

:snake: Client library to use the IBM Watson services in Python and available in pip as watson-developer-cloud
https://pypi.org/project/ibm-watson/
Apache License 2.0
1.46k stars 827 forks source link

VisualRecognitionV3.classify on local zip file #194

Closed daavoo closed 7 years ago

daavoo commented 7 years ago

In:

https://github.com/watson-developer-cloud/python-sdk/blob/master/watson_developer_cloud/visual_recognition_v3.py#L139

Says that you can use a zip file as argument. However when I try to use the following code locally:

vr = VisualRecognitionV3(api_key=MY_API_KEY, version='2016-05-20')
with open("husky.zip", "rb") as f:
    vr.classify(images_file=f, classifier_ids="dogs_634305779")

The following error raises:

Traceback (most recent call last):

  File "<ipython-input-129-8f7875a2553a>", line 2, in <module>
    vr.classify(images_file=f, classifier_ids="dogs_634305779")

  File "/home/daviddelaiglesia/miniconda2/envs/pytorch/lib/python3.6/site-packages/watson_developer_cloud/visual_recognition_v3.py", line 156, in classify
    params)

  File "/home/daviddelaiglesia/miniconda2/envs/pytorch/lib/python3.6/site-packages/watson_developer_cloud/visual_recognition_v3.py", line 133, in _image_call
    accept_json=True)

  File "/home/daviddelaiglesia/miniconda2/envs/pytorch/lib/python3.6/site-packages/watson_developer_cloud/watson_developer_cloud_service.py", line 298, in request
    **kwargs)

  File "/home/daviddelaiglesia/miniconda2/envs/pytorch/lib/python3.6/site-packages/requests/api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)

  File "/home/daviddelaiglesia/miniconda2/envs/pytorch/lib/python3.6/site-packages/requests/sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)

  File "/home/daviddelaiglesia/miniconda2/envs/pytorch/lib/python3.6/site-packages/requests/sessions.py", line 609, in send
    r = adapter.send(request, **kwargs)

  File "/home/daviddelaiglesia/miniconda2/envs/pytorch/lib/python3.6/site-packages/requests/adapters.py", line 473, in send
    raise ConnectionError(err, request=request)

ConnectionError: ('Connection aborted.', BrokenPipeError(32, 'Broken pipe'))

I notice that the test don't cover this functionality:

https://github.com/watson-developer-cloud/python-sdk/blob/master/test/test_visual_recognition_v3.py#L116

Is it possible to use classify with a zip file locally???

Thank you very much.

kognate commented 7 years ago

It should be, but let me look into this.

kognate commented 7 years ago

it works in python 2.7 but not in 3.6, I think I know what the problem is, so I'll have a fix soon or an explanation of why my theory was wrong.

kognate commented 7 years ago

My theory was wrong, and I'm getting a data encoding error when I try to post with 3.7.

daavoo commented 7 years ago

Thanks for the quick response!.

I'm currently unzipping the file locally and calling classify on one image at a time as workaround.

Looking foward to updates :)

UmanShahzad commented 7 years ago

Since I'm going to be using VR3 very soon and possibly classifying in zip files, I'll also be looking into this as I work. Any progress on the issue will be appreciated so I don't duplicate effort.

jsstylos commented 7 years ago

I'm having a hard time reproducing this. I updated the integration test to use a zip and it worked on Python 2.7, 3.4 and 3.5, but then I bumped into the transaction limit for the day and am waiting to resolve that.

Are there specific zip files that fail and others that don't, or environmental factors?

daavoo commented 7 years ago

So, I used the following script to test this locally:

import os
from watson_developer_cloud import VisualRecognitionV3

def test_classify(vr, data):
    with open(data, "rb") as f:
        try:
            r = vr.classify(images_file=f, classifier_ids="dogs_634305779")
            with open("responses.txt", "a") as responses:
                responses.write("{}: ALL GOOD\n".format(f.name))

        except Exception as e:
            with open("responses.txt", "a") as responses:
                responses.write("{}: {}\n".format(f.name, e))

if __name__ == '__main__':

    import argparse
    ap = argparse.ArgumentParser()
    ap.add_argument("api_key")
    args = vars(ap.parse_args())

    vr = VisualRecognitionV3(api_key=args["api_key"], version='2016-05-20')

    for f in os.listdir(os.getcwd()):
        if f.split(".")[-1] in {"jpg", "zip"}:
            print("Testing on: {}".format(f))
            test_classify(vr, f)

And I created this gist to share the results with you.

I'm using conda to create the isolated enviroments. Package details are on the gist.

So, based on my local results, looks like zip files with > ~10 images inside are the problem. Maybe something wrong with big HTTP headers or something?

Can you reproduce this results?

jsstylos commented 7 years ago

Ok, I downloaded the beagle.zip file and was able to reproduce at least one problem. The response was a 413 - Request Entity Too Large, which was not getting displayed by the Python SDK because the response was in html instead of JSON:

<head><title>413 Request Entity Too Large</title></head>
<body bgcolor="white">
<center><h1>413 Request Entity Too Large</h1></center>
<hr><center>nginx</center>
</body>
</html>

This should get fixed in the next release of the Visual Recognition service (timeline pending, but "soon"ish), but in the meantime the issue seems to be specific to large zip files (possibly just zip files with image files that exceed the supported size), so the workaround for this particular issue would be to use smaller image files in the zip.

When you try to use one of the large images from beagle.zip on its own, for instance Beagle-Rude.JPG, you get a more sensible error response:

        "description": "Image size limit exceeded (4129839 bytes > 2097152 bytes [2 MiB]).",
        "error_id": "input_error"
      },

When I resized all of the images in beagle.zip down to a longside of 600px, the request worked.

So, the service should more gracefully handle large inputs, and the SDK should relay the service's error messages, but in the meantime the solution is to use smaller images.

daavoo commented 7 years ago

Thanks @jsstylos . Copy that. Use images < 2Mb.

I suppose that I should close this.