Closed aaronspring closed 4 years ago
the new caching is designed to also work with this case: some zip files contain more than one shapefile:
import fsspec as fs
import geopandas as gpd
# cache zip to dask and take 1.shp
with fs.open_files(f'zip://gadm36_ALA_1.shp::simplecache:://biogeo.ucdavis.edu/data/gadm3.6/shp/gadm36_ALA_shp.zip', target_protocol='https',simplecache={'cache_storage': 'ALA2','same_names':True}) as f:
gdf = gpd.read_file(f)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/core.py", line 248, in open_files
protocol=protocol,
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/core.py", line 523, in get_fs_token_paths
fs = cls(**options)
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/spec.py", line 54, in __call__
obj = super().__call__(*args, **kwargs)
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/zip.py", line 53, in __init__
self.fo = fo.__enter__() # the whole instance is a context
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/core.py", line 100, in __enter__
f = self.fs.open(self.path, mode=mode)
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/cached.py", line 316, in <lambda>
return lambda *args, **kw: getattr(type(self), item)(self, *args, **kw)
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/spec.py", line 775, in open
**kwargs
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/cached.py", line 316, in <lambda>
return lambda *args, **kw: getattr(type(self), item)(self, *args, **kw)
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/cached.py", line 482, in _open
with self.fs._open(path, **kwargs) as f, open(fn, "wb") as f2:
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/local.py", line 109, in _open
return LocalFileOpener(path, mode, fs=self, **kwargs)
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/local.py", line 187, in __init__
self._open()
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/local.py", line 192, in _open
self.f = open(self.path, mode=self.mode)
IsADirectoryError: [Errno 21] Is a directory: '/Users/aaron.spring/Coding/intake_geopandas'
shouldnt this work? https://filesystem-spec.readthedocs.io/en/latest/features.html#url-chaining https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
Almost right :)
fs.open_files(
'zip://gadm36_ALA_1.shp::simplecache:://biogeo.ucdavis.edu'
'/data/gadm3.6/shp/gadm36_ALA_shp.zip',
simplecache={'cache_storage': 'ALA2','same_names':True, 'target_protocol': 'https'}
)
or, drop the target_protocol and include in the URL
'zip://gadm36_ALA_1.shp::simplecache::https//biogeo.ucdavis.edu'
'/data/gadm3.6/shp/gadm36_ALA_shp.zip'
Also, open_files returns a list of two OpenFile objects. You need to with
for each, in turn, or after https://github.com/intake/filesystem_spec/pull/358 , you can have a single with
context giving multiple open file-likes.
I still get the same error. the given path is where I started python. Even before the contextmanager
# also fs.open(...
fs.open_files('zip://gadm36_ALA_1.shp::simplecache::https//biogeo.ucdavis.edu/data/gadm3.6/shp/gadm36_ALA_shp.zip',simplecache={'cache_storage': 'ALA','same_names':True})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/core.py", line 378, in open
**kwargs
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/core.py", line 248, in open_files
protocol=protocol,
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/core.py", line 523, in get_fs_token_paths
fs = cls(**options)
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/spec.py", line 54, in __call__
obj = super().__call__(*args, **kwargs)
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/zip.py", line 53, in __init__
self.fo = fo.__enter__() # the whole instance is a context
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/core.py", line 100, in __enter__
f = self.fs.open(self.path, mode=mode)
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/cached.py", line 316, in <lambda>
return lambda *args, **kw: getattr(type(self), item)(self, *args, **kw)
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/spec.py", line 775, in open
**kwargs
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/cached.py", line 316, in <lambda>
return lambda *args, **kw: getattr(type(self), item)(self, *args, **kw)
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/cached.py", line 482, in _open
with self.fs._open(path, **kwargs) as f, open(fn, "wb") as f2:
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/local.py", line 109, in _open
return LocalFileOpener(path, mode, fs=self, **kwargs)
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/local.py", line 187, in __init__
self._open()
File "/Users/aaron.spring/anaconda3/envs/xr/lib/python3.7/site-packages/fsspec/implementations/local.py", line 192, in _open
self.f = open(self.path, mode=self.mode)
IsADirectoryError: [Errno 21] Is a directory: '/Users/aaron.spring/Coding/intake_geopandas'
It's working OK here. Perhaps you need fsspec master?
In [12]: files = fs.open_files(f'zip://gadm36_ALA_1.shp::simplecache::http://biogeo.ucdavis.edu/data/gadm3.6/shp/gadm36_ALA_shp.zip', simplecache={'same_names':True, 'cache_storage': 'cache'}
...: )
In [13]: with files[0] as f:
...: print(f.read(1))
...:
b'\x00'
Thanks for all the feedback. I will start from scratch and need to understand the basics first completely.
I got the old caching working with shapefiles. I hope the tests really test that I take the local source. At least the print statement confirms this.
closes #15
Is this how you would tackle the problem? @ian-r-rose @martindurant didnt understand everything from your discussion in #15