Open alexcg1 opened 3 years ago
Seems to work okay if I remove +docker
I think this is a problem with Jina core. For some reason, the Flow transfers loading of the document to the first node (running on docker), instead of doing it itself/at the gateway. I'll try to examine this more closely.
@alexcg1, this modification of your script should work (replace the path):
from jina import Flow
from jina.types.document.generators import from_files
docs = from_files("data/**/*.png")
flow = Flow(protocol="http", port_expose=4352).add(
uses="jinahub+docker://ImageNormalizer",
name="crafter",
volumes="/home/tadej/projects/workshops/data:/image_normalizer/data",
)
with flow:
flow.post(on="/index", inputs=docs, on_done=print)
flow.block()
Not sure if it is working though - it runs without crashing, but the on_done
doesn't seem to get called?
The more general problem is this: if a document contains image URI (which is the case with from_files
), then image normalizer tries to load this image. In this case this is really unexpected - I have expected that the images are loaded themselves from from_files
. We might need to reconsider the role of the URI, as this does not scale well beyond local setting
The point is that u should know that the path
is relative and the executor has another relative view. The best way is to do the load urself from uri
to blob.
So please, let's open the issue on the ImageNormalizer, but I do not think this is a bug
I'll move this back to executors then - but in my opinion the fact that we have URI is not really useful. Why not simply force users to load images into blobs? Just using Image.open
or similar is definetly something they are familiar with, and will prevent these kinds of situations where this is used under the hood and they are not aware of it.
In theory the option of providing directly blobs should be supported in the executor.
I agree. I was just saying that when using from_files
the users will expect the actual images to be loaded immediately - that's the exact reason this issue arose. And I think a general solution would be to kill the URL completely. We could have load_images
helper function to load docs with blobs, instead of from_files
, which I think tries to do too much.
Also, the normalizer filters images using
filtered_docs = DocumentArray(
list(filter(lambda d: 'image/' in d.mime_type, docs))
)
which will actually ignore the documents that were created with Document(blob=img_blob)
, as the mime type does not get set automatically. I guess this is something that should be fixed?
I agree that from_files
should directly load them, although putting into URI
is more flexible and puts more power into the Executor
.
And the uri
is powerful not only for local paths but also for remote uris
I agree. I was just saying that when using
from_files
the users will expect the actual images to be loaded immediately - that's the exact reason this issue arose. And I think a general solution would be to kill the URL completely. We could haveload_images
helper function to load docs with blobs, instead offrom_files
, which I think tries to do too much.Also, the normalizer filters images using
filtered_docs = DocumentArray( list(filter(lambda d: 'image/' in d.mime_type, docs)) )
which will actually ignore the documents that were created with
Document(blob=img_blob)
, as the mime type does not get set automatically. I guess this is something that should be fixed?
Yes, this should be fixed. We do not want to rely on the mime type. if blob
is set we assume it can work. It is way less dangerous.
ImageNormalizer can't seem to find the files despite them quite clearly existing. Perhaps it's looking in its own Docker container instead of the current dir?
My code
Error log
I've tried adding options with
uses_with
, but no change. Note the above program is the simplest version possible. My initial efforts had a lot more Executors.jina -v
2.0.10