jina-ai / executors

internal-only
Apache License 2.0
31 stars 12 forks source link

ImageNormalizer: FileNotFoundError(2, 'No such file or directory') #49

Open alexcg1 opened 3 years ago

alexcg1 commented 3 years ago

ImageNormalizer can't seem to find the files despite them quite clearly existing. Perhaps it's looking in its own Docker container instead of the current dir?

My code

from jina import Flow
from jina.types.document.generators import from_files

docs = from_files("data/**/*.png")

flow = Flow(protocol="http", port_expose=4352).add(
    uses="jinahub+docker://ImageNormalizer",
    name="crafter",
)

with flow:
    flow.post(on="/index", inputs=docs)
    flow.block()

Error log

        crafter@55310[L]:ready and listening
        gateway@55310[L]:ready and listening
        crafter@55334[I]:
        crafter@55334[I]:▶️  /usr/local/bin/jina executor --uses config.yml --name crafter --workspace /home/alexcg/tmp/jina/clip-hub-test --identity 6ec761b1-b48b-42f2-8657-f1ea972048ed --workspace-id cd6d14a4-3b1a-41a6-b1bc-00f2fae1f0ad --zmq-identity 45d1bad1-9106-4370-b915-e417edd4d518 --port-ctrl 36367 --port-in 38271 --port-out 59533 --hosts-in-connect --socket-in ROUTER_BIND --socket-out ROUTER_BIND --num-part 1 --dynamic-routing-out --dynamic-routing-in --port-expose 4352 --upload-files --noblock-on-start --runs-in-docker
        crafter@55334[I]:🔧️                            cli = executor                      
        crafter@55334[I]:ctrl-with-ipc = False
        crafter@55334[I]:daemon = False
        crafter@55334[I]:disable-remote = False
        crafter@55334[I]:docker-kwargs = None
        crafter@55334[I]:dynamic-routing = True
        crafter@55334[I]:🔧️             dynamic-routing-in = True                          
        crafter@55334[I]:🔧️            dynamic-routing-out = True                          
        crafter@55334[I]:entrypoint = None
        crafter@55334[I]:env = None
        crafter@55334[I]:expose-public = False
        crafter@55334[I]:external = False
        crafter@55334[I]:gpus = None
        crafter@55334[I]:host = 0.0.0.0
        crafter@55334[I]:host-in = 0.0.0.0
        crafter@55334[I]:host-out = 0.0.0.0
        crafter@55334[I]:🔧️               hosts-in-connect = []                            
        crafter@55334[I]:🔧️                       identity = 6ec761b1-b48b-42f2-8657-f1ea97
        crafter@55334[I]:log-config = /usr/local/lib/python3.7/site-
        crafter@55334[I]:memory-hwm = -1
        crafter@55334[I]:🔧️                           name = crafter                       
        crafter@55334[I]:🔧️               noblock-on-start = True                          
        crafter@55334[I]:🔧️                       num-part = 1                             
        crafter@55334[I]:on-error-strategy = IGNORE
        crafter@55334[I]:parallel = 1
        crafter@55334[I]:pea-id = 0
        crafter@55334[I]:pea-role = SINGLETON
        crafter@55334[I]:peas-hosts = None
        crafter@55334[I]:pod-role = None
        crafter@55334[I]:polling = ANY
        crafter@55334[I]:🔧️                      port-ctrl = 36367                         
        crafter@55334[I]:🔧️                    port-expose = 4352                          
        crafter@55334[I]:🔧️                        port-in = 38271                         
        crafter@55334[I]:🔧️                       port-out = 59533                         
        crafter@55334[I]:proxy = False
        crafter@55334[I]:pull-latest = False
        crafter@55334[I]:py-modules = None
        crafter@55334[I]:quiet = False
        crafter@55334[I]:quiet-error = False
        crafter@55334[I]:quiet-remote-logs = False
        crafter@55334[I]:replicas = 1
        crafter@55334[I]:🔧️                 runs-in-docker = True                          
        crafter@55334[I]:runtime-backend = PROCESS
        crafter@55334[I]:runtime-cls = ZEDRuntime
        crafter@55334[I]:scheduling = LOAD_BALANCE
        crafter@55334[I]:🔧️                      socket-in = ROUTER_BIND                   
        crafter@55334[I]:🔧️                     socket-out = ROUTER_BIND                   
        crafter@55334[I]:ssh-keyfile = None
        crafter@55334[I]:ssh-password = None
        crafter@55334[I]:ssh-server = None
        crafter@55334[I]:timeout-ctrl = 5000
        crafter@55334[I]:timeout-ready = 600000
        crafter@55334[I]:🔧️                   upload-files = []                            
        crafter@55334[I]:🔧️                           uses = config.yml                    
        crafter@55334[I]:uses-after = None
        crafter@55334[I]:uses-before = None
        crafter@55334[I]:uses-metas = None
        crafter@55334[I]:uses-with = None
        crafter@55334[I]:volumes = None
        crafter@55334[I]:🔧️                      workspace = /home/alexcg/tmp/jina/clip-hub
        crafter@55334[I]:🔧️                   workspace-id = cd6d14a4-3b1a-41a6-b1bc-00f2fa
        crafter@55334[I]:🔧️                   zmq-identity = 45d1bad1-9106-4370-b915-e417ed
        crafter@55334[I]:
        crafter@55334[I]:           JINA@ 1[W]:You are using Jina version 2.0.14, however version 2.0.15 is available. You should consider upgrading via the "pip install --upgrade jina" command.
           Flow@55310[I]:🎉 Flow is ready to use!
    🔗 Protocol:         HTTP
    🏠 Local access: 0.0.0.0:4352
    🔒 Private network:  192.168.3.16:4352
    🌐 Public address:   94.135.231.132:4352
    💬 Swagger UI:       http://localhost:4352/docs
    📚 Redoc:        http://localhost:4352/redoc
        crafter@55334[I]:        crafter@17[E]:FileNotFoundError(2, 'No such file or directory')
        crafter@55334[I]:add "--quiet-error" to suppress the exception details
        crafter@55334[I]:Traceback (most recent call last):
        crafter@55334[I]:File "/usr/local/lib/python3.7/site-packages/jina/peapods/runtimes/zmq/zed.py", line 313, in _msg_callback
        crafter@55334[I]:processed_msg = self._callback(msg)
        crafter@55334[I]:File "/usr/local/lib/python3.7/site-packages/jina/peapods/runtimes/zmq/zed.py", line 299, in _callback
        crafter@55334[I]:self._pre_hook(msg)._handle()._post_hook(msg)
        crafter@55334[I]:File "/usr/local/lib/python3.7/site-packages/jina/peapods/runtimes/zmq/zed.py", line 251, in _handle
        crafter@55334[I]:groundtruths_matrix=self.groundtruths_matrix,
        crafter@55334[I]:File "/usr/local/lib/python3.7/site-packages/jina/executors/__init__.py", line 190, in __call__
        crafter@55334[I]:self, **kwargs
        crafter@55334[I]:File "/usr/local/lib/python3.7/site-packages/jina/executors/decorators.py", line 103, in arg_wrapper
        crafter@55334[I]:return fn(*args, **kwargs)
        crafter@55334[I]:File "/image_normalizer/normalizer.py", line 56, in craft
        crafter@55334[I]:self._convert_image_to_blob(doc)
        crafter@55334[I]:File "/image_normalizer/normalizer.py", line 67, in _convert_image_to_blob
        crafter@55334[I]:doc.convert_image_uri_to_blob()
        crafter@55334[I]:File "/usr/local/lib/python3.7/site-packages/jina/types/document/__init__.py", line 1044, in convert_image_uri_to_blob
        crafter@55334[I]:(uri_prefix + self.uri) if uri_prefix else self.uri, color_axis
        crafter@55334[I]:File "/usr/local/lib/python3.7/site-packages/jina/types/document/converters.py", line 92, in to_image_blob
        crafter@55334[I]:raw_img = Image.open(source).convert('RGB')
        crafter@55334[I]:File "/usr/local/lib/python3.7/site-packages/PIL/Image.py", line 2912, in open
        crafter@55334[I]:fp = builtins.open(filename, "rb")
        crafter@55334[I]:FileNotFoundError: [Errno 2] No such file or directory: 'data/44.png'
        crafter@55334[I]:        crafter@17[E]:FileNotFoundError(2, 'No such file or directory')
        crafter@55334[I]:add "--quiet-error" to suppress the exception details
        crafter@55334[I]:Traceback (most recent call last):
        crafter@55334[I]:File "/usr/local/lib/python3.7/site-packages/jina/peapods/runtimes/zmq/zed.py", line 313, in _msg_callback
        crafter@55334[I]:processed_msg = self._callback(msg)
        crafter@55334[I]:File "/usr/local/lib/python3.7/site-packages/jina/peapods/runtimes/zmq/zed.py", line 299, in _callback
        crafter@55334[I]:self._pre_hook(msg)._handle()._post_hook(msg)
        crafter@55334[I]:File "/usr/local/lib/python3.7/site-packages/jina/peapods/runtimes/zmq/zed.py", line 251, in _handle
        crafter@55334[I]:groundtruths_matrix=self.groundtruths_matrix,
        crafter@55334[I]:File "/usr/local/lib/python3.7/site-packages/jina/executors/__init__.py", line 190, in __call__
        crafter@55334[I]:self, **kwargs
        crafter@55334[I]:File "/usr/local/lib/python3.7/site-packages/jina/executors/decorators.py", line 103, in arg_wrapper
        crafter@55334[I]:return fn(*args, **kwargs)
        crafter@55334[I]:File "/image_normalizer/normalizer.py", line 56, in craft
        crafter@55334[I]:self._convert_image_to_blob(doc)
        crafter@55334[I]:File "/image_normalizer/normalizer.py", line 67, in _convert_image_to_blob
        crafter@55334[I]:doc.convert_image_uri_to_blob()
        crafter@55334[I]:File "/usr/local/lib/python3.7/site-packages/jina/types/document/__init__.py", line 1044, in convert_image_uri_to_blob
        crafter@55334[I]:(uri_prefix + self.uri) if uri_prefix else self.uri, color_axis
        crafter@55334[I]:File "/usr/local/lib/python3.7/site-packages/jina/types/document/converters.py", line 92, in to_image_blob
        crafter@55334[I]:raw_img = Image.open(source).convert('RGB')
        crafter@55334[I]:File "/usr/local/lib/python3.7/site-packages/PIL/Image.py", line 2912, in open
        crafter@55334[I]:fp = builtins.open(filename, "rb")
        crafter@55334[I]:FileNotFoundError: [Errno 2] No such file or directory: 'data/109.png'

I've tried adding options with uses_with, but no change. Note the above program is the simplest version possible. My initial efforts had a lot more Executors.

jina -v

2.0.10

alexcg1 commented 3 years ago

Seems to work okay if I remove +docker

tadejsv commented 3 years ago

I think this is a problem with Jina core. For some reason, the Flow transfers loading of the document to the first node (running on docker), instead of doing it itself/at the gateway. I'll try to examine this more closely.

tadejsv commented 3 years ago

@alexcg1, this modification of your script should work (replace the path):

from jina import Flow
from jina.types.document.generators import from_files

docs = from_files("data/**/*.png")

flow = Flow(protocol="http", port_expose=4352).add(
    uses="jinahub+docker://ImageNormalizer",
    name="crafter",
    volumes="/home/tadej/projects/workshops/data:/image_normalizer/data",
)

with flow:
    flow.post(on="/index", inputs=docs, on_done=print)
    flow.block()

Not sure if it is working though - it runs without crashing, but the on_done doesn't seem to get called?

The more general problem is this: if a document contains image URI (which is the case with from_files), then image normalizer tries to load this image. In this case this is really unexpected - I have expected that the images are loaded themselves from from_files. We might need to reconsider the role of the URI, as this does not scale well beyond local setting

JoanFM commented 3 years ago

The point is that u should know that the path is relative and the executor has another relative view. The best way is to do the load urself from uri to blob.

JoanFM commented 3 years ago

So please, let's open the issue on the ImageNormalizer, but I do not think this is a bug

tadejsv commented 3 years ago

I'll move this back to executors then - but in my opinion the fact that we have URI is not really useful. Why not simply force users to load images into blobs? Just using Image.open or similar is definetly something they are familiar with, and will prevent these kinds of situations where this is used under the hood and they are not aware of it.

JoanFM commented 3 years ago

In theory the option of providing directly blobs should be supported in the executor.

tadejsv commented 3 years ago

I agree. I was just saying that when using from_files the users will expect the actual images to be loaded immediately - that's the exact reason this issue arose. And I think a general solution would be to kill the URL completely. We could have load_images helper function to load docs with blobs, instead of from_files, which I think tries to do too much.

Also, the normalizer filters images using

        filtered_docs = DocumentArray(
            list(filter(lambda d: 'image/' in d.mime_type, docs))
        )

which will actually ignore the documents that were created with Document(blob=img_blob), as the mime type does not get set automatically. I guess this is something that should be fixed?

JoanFM commented 3 years ago

I agree that from_files should directly load them, although putting into URI is more flexible and puts more power into the Executor.

And the uri is powerful not only for local paths but also for remote uris

JoanFM commented 3 years ago

I agree. I was just saying that when using from_files the users will expect the actual images to be loaded immediately - that's the exact reason this issue arose. And I think a general solution would be to kill the URL completely. We could have load_images helper function to load docs with blobs, instead of from_files, which I think tries to do too much.

Also, the normalizer filters images using

        filtered_docs = DocumentArray(
            list(filter(lambda d: 'image/' in d.mime_type, docs))
        )

which will actually ignore the documents that were created with Document(blob=img_blob), as the mime type does not get set automatically. I guess this is something that should be fixed?

Yes, this should be fixed. We do not want to rely on the mime type. if blob is set we assume it can work. It is way less dangerous.