intake / intake-xarray

Intake plugin for xarray
https://intake-xarray.readthedocs.io/
BSD 2-Clause "Simplified" License
74 stars 36 forks source link

Compatibility with the dropbox protocol #60

Closed MarineChap closed 4 years ago

MarineChap commented 4 years ago

Hi, I am trying to use this module with the dropbox implementation but It gives me an error SyntaxError: not a JPEG file (full error below).

Do you have an idea if it comes from dropboxdrivefs implementation which doesn't give an information needed, or another problem not related to the fsspec implementation used ?

Thanks you for your help

How to reproduce it ? image = intake.open_xarray_image("dropbox://Path/name_file.jpg", storage_options={"token": "******")

image.to_dask()

Error:

File "/home/chaput/miniconda3/envs/fkdev/lib/python3.7/site-packages/intake_xarray/base.py", line 69, in to_dask return self.read_chunked() File "/home/chaput/miniconda3/envs/fkdev/lib/python3.7/site-packages/intake_xarray/base.py", line 44, in read_chunked self._load_metadata() File "/home/chaput/miniconda3/envs/fkdev/lib/python3.7/site-packages/intake/source/base.py", line 117, in _load_metadata self._schema = self._get_schema() File "/home/chaput/miniconda3/envs/fkdev/lib/python3.7/site-packages/intake_xarray/image.py", line 354, in _get_schema self._open_dataset() File "/home/chaput/miniconda3/envs/fkdev/lib/python3.7/site-packages/intake_xarray/image.py", line 341, in _open_dataset self._ds = reader(files[0], self.chunks, **self._kwargs) File "/home/chaput/miniconda3/envs/fkdev/lib/python3.7/site-packages/intake_xarray/image.py", line 151, in reader array = imread(f) File "/home/chaput/miniconda3/envs/fkdev/lib/python3.7/site-packages/skimage/io/_io.py", line 61, in imread img = call_plugin('imread', fname, plugin=plugin, **plugin_args) File "/home/chaput/miniconda3/envs/fkdev/lib/python3.7/site-packages/skimage/io/manage_plugins.py", line 210, in call_plugin return func(*args, **kwargs) File "/home/chaput/miniconda3/envs/fkdev/lib/python3.7/site-packages/imageio/core/functions.py", line 264, in imread reader = read(uri, format, "i", **kwargs) File "/home/chaput/miniconda3/envs/fkdev/lib/python3.7/site-packages/imageio/core/functions.py", line 186, in get_reader return format.get_reader(request) File "/home/chaput/miniconda3/envs/fkdev/lib/python3.7/site-packages/imageio/core/format.py", line 164, in get_reader return self.Reader(self, request) File "/home/chaput/miniconda3/envs/fkdev/lib/python3.7/site-packages/imageio/core/format.py", line 214, in __init__ self._open(**self.request.kwargs.copy()) File "/home/chaput/miniconda3/envs/fkdev/lib/python3.7/site-packages/imageio/plugins/pillow.py", line 431, in _open return PillowFormat.Reader._open(self, pilmode=pilmode, as_gray=as_gray) File "/home/chaput/miniconda3/envs/fkdev/lib/python3.7/site-packages/imageio/plugins/pillow.py", line 127, in _open self._im = factory(self._fp, "") File "/home/chaput/miniconda3/envs/fkdev/lib/python3.7/site-packages/PIL/JpegImagePlugin.py", line 782, in jpeg_factory im = JpegImageFile(fp, filename) File "/home/chaput/miniconda3/envs/fkdev/lib/python3.7/site-packages/PIL/ImageFile.py", line 107, in __init__ self._open() File "/home/chaput/miniconda3/envs/fkdev/lib/python3.7/site-packages/PIL/JpegImagePlugin.py", line 346, in _open raise SyntaxError("not a JPEG file") SyntaxError: not a JPEG file

martindurant commented 4 years ago

well, I suppose you are reduced to trying to replicate how the image package is loading the file, and how the output differs from the same file locally. For instance, it might be trying to read the file footer first, but seek() isn't working as expected. I'm afraid I don't know the structure of JPEG files, so I can only wish you good luck debugging!

jsignell commented 4 years ago

I don't think dropbox is a supported storage location for the image plugin. Whether a particular location is supported depends on the io function itself. In this case that is coming from skimage.

MarineChap commented 4 years ago

Ok, well I will definitly look into it - maybe I can manage a workaround somewhere in the dropbox interface. Thanks you for the clue. I cannot find documentation stating clearly supported storage locations for skimage.io. Where I am surprised is that works for s3 but not by using the http protocol ? Also, I was thinking the whole point to using fsspec is to decorralate the location of the data from how to read it, no?

I will look into it but if you have an idea what makes the difference in that case, I am interested to know. :)

Edit: intake-xarray is not working with http protocol? Because the dropbox interface is based on the fsspec http interface.

MarineChap commented 4 years ago

It is fixed. I was a really weird error that I still not be able to track down in the sub-sub...module of Pillow etc... But At least I found a good workaround that I have implemented in the dropboxdrivefs module making it finally compatible with your module and intake in general. Thank you ! your comment give me the path to fix it.

martindurant commented 4 years ago

(also, such cases where the target function only takes local files, you can always use the caching filesystem to pull the file from whichever backend)