intake / intake_geopandas

An intake plugin for loading datasets with geopandas
BSD 2-Clause "Simplified" License
15 stars 7 forks source link

[WIP] old caching #16

Closed aaronspring closed 4 years ago

aaronspring commented 4 years ago

I got the old caching working with shapefiles. I hope the tests really test that I take the local source. At least the print statement confirms this.

closes #15

Is this how you would tackle the problem? @ian-r-rose @martindurant didnt understand everything from your discussion in #15

aaronspring commented 4 years ago

the new caching is designed to also work with this case: some zip files contain more than one shapefile:

import fsspec as fs
import geopandas as gpd

# cache zip to dask and take 1.shp
with fs.open_files(f'zip://gadm36_ALA_1.shp::simplecache:://biogeo.ucdavis.edu/data/gadm3.6/shp/gadm36_ALA_shp.zip', target_protocol='https',simplecache={'cache_storage': 'ALA2','same_names':True}) as f:
    gdf = gpd.read_file(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/core.py", line 248, in open_files
    protocol=protocol,
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/core.py", line 523, in get_fs_token_paths
    fs = cls(**options)
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/spec.py", line 54, in __call__
    obj = super().__call__(*args, **kwargs)
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/zip.py", line 53, in __init__
    self.fo = fo.__enter__()  # the whole instance is a context
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/core.py", line 100, in __enter__
    f = self.fs.open(self.path, mode=mode)
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/cached.py", line 316, in <lambda>
    return lambda *args, **kw: getattr(type(self), item)(self, *args, **kw)
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/spec.py", line 775, in open
    **kwargs
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/cached.py", line 316, in <lambda>
    return lambda *args, **kw: getattr(type(self), item)(self, *args, **kw)
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/cached.py", line 482, in _open
    with self.fs._open(path, **kwargs) as f, open(fn, "wb") as f2:
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/local.py", line 109, in _open
    return LocalFileOpener(path, mode, fs=self, **kwargs)
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/local.py", line 187, in __init__
    self._open()
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/local.py", line 192, in _open
    self.f = open(self.path, mode=self.mode)
IsADirectoryError: [Errno 21] Is a directory: '/Users/aaron.spring/Coding/intake_geopandas'

shouldnt this work? https://filesystem-spec.readthedocs.io/en/latest/features.html#url-chaining https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally

martindurant commented 4 years ago

Almost right :)

Also, open_files returns a list of two OpenFile objects. You need to with for each, in turn, or after https://github.com/intake/filesystem_spec/pull/358 , you can have a single with context giving multiple open file-likes.

aaronspring commented 4 years ago

I still get the same error. the given path is where I started python. Even before the contextmanager

# also fs.open(...
fs.open_files('zip://gadm36_ALA_1.shp::simplecache::https//biogeo.ucdavis.edu/data/gadm3.6/shp/gadm36_ALA_shp.zip',simplecache={'cache_storage': 'ALA','same_names':True})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/core.py", line 378, in open
    **kwargs
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/core.py", line 248, in open_files
    protocol=protocol,
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/core.py", line 523, in get_fs_token_paths
    fs = cls(**options)
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/spec.py", line 54, in __call__
    obj = super().__call__(*args, **kwargs)
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/zip.py", line 53, in __init__
    self.fo = fo.__enter__()  # the whole instance is a context
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/core.py", line 100, in __enter__
    f = self.fs.open(self.path, mode=mode)
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/cached.py", line 316, in <lambda>
    return lambda *args, **kw: getattr(type(self), item)(self, *args, **kw)
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/spec.py", line 775, in open
    **kwargs
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/cached.py", line 316, in <lambda>
    return lambda *args, **kw: getattr(type(self), item)(self, *args, **kw)
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/cached.py", line 482, in _open
    with self.fs._open(path, **kwargs) as f, open(fn, "wb") as f2:
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/local.py", line 109, in _open
    return LocalFileOpener(path, mode, fs=self, **kwargs)
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/local.py", line 187, in __init__
    self._open()
  File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/local.py", line 192, in _open
    self.f = open(self.path, mode=self.mode)
IsADirectoryError: [Errno 21] Is a directory: '/Users/aaron.spring/Coding/intake_geopandas'
martindurant commented 4 years ago

It's working OK here. Perhaps you need fsspec master?

In [12]: files = fs.open_files(f'zip://gadm36_ALA_1.shp::simplecache::http://biogeo.ucdavis.edu/data/gadm3.6/shp/gadm36_ALA_shp.zip', simplecache={'same_names':True, 'cache_storage': 'cache'}
    ...: )

In [13]: with files[0] as f:
    ...:     print(f.read(1))
    ...:
b'\x00'
aaronspring commented 4 years ago

Thanks for all the feedback. I will start from scratch and need to understand the basics first completely.