Status of reading a zipped shape file and caching

cwerner commented 3 years ago

Hi all.

I'm currently trying to read natural earth shape files using intake_geopandas. However, I'm confused if this is currently possible. I tried to follow recent commits and the tests...

Would anyone share a minimal example of an intake catalog for how to read correctly?

This is my catalog section:

plugins:
  source:
    - module: intake_geopandas

sources:
  admin_new:
    name: 'ADMIN_NEW'
    description: 'New country and region layer'
    driver: intake_geopandas.geopandas.ShapefileSource
    parameters:
      res:
        default: 10
        allowed: [10, 50, 110]
        description: 'Resolution (10, 50 or 110).'
        type: int
    args:
      urlpath: 'simplecache::zip://*::https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/{{res}}m/cultural/ne_{{res}}m_admin_0_countries.zip'

      storage_options:
        simplecache:
              cache_storage: '.cache'
              same_names: true

However, this fails... Is the syntax correct?

Error (last lines):

fiona/ogrext.pyx in fiona.ogrext.Session.start()

fiona/_shim.pyx in fiona._shim.gdal_open_vector()

DriverError: '/vsizip/simplecache::zip://*::https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip' does not exist in the file system, and is not recognized as a supported dataset name.

Cheers, C

cwerner commented 3 years ago

Solved it 🤷

I swapped the driver to geopandasfile and added use_fsspec = True to the config (which I tried before but this failed apparently due to the wrong driver selection intake_geopandas.geopandas.ShapefileSource).

working url:

urlpath:'simplecache::https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/{{res}}m/cultural/ne_{{res}}m_admin_0_countries.zip'

martindurant commented 3 years ago

Wondering, did it also work with the "zip://" part?

cwerner commented 3 years ago

I just tried:

URL:

      urlpath: 'simplecache::zip://*::https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/{{res}}m/cultural/ne_{{res}}m_admin_0_countries.zip'

Result:

Traceback (most recent call last):
  File "regions_inheritance.py", line 198, in <module>
    df = catalog.admin_new(res=Rmap[cfg.res]).read()
  File "/Users/werner-ch/.pyenv/versions/3.8.0/envs/ldndctools/lib/python3.8/site-packages/intake_geopandas/geopandas.py", line 49, in read
    self._get_schema()
  File "/Users/werner-ch/.pyenv/versions/3.8.0/envs/ldndctools/lib/python3.8/site-packages/intake_geopandas/geopandas.py", line 32, in _get_schema
    self._open_dataset()
  File "/Users/werner-ch/.pyenv/versions/3.8.0/envs/ldndctools/lib/python3.8/site-packages/intake_geopandas/geopandas.py", line 120, in _open_dataset
    f = self._resolve_single_file(f) if len(f) > 1 else f[0]
  File "/Users/werner-ch/.pyenv/versions/3.8.0/envs/ldndctools/lib/python3.8/site-packages/intake_geopandas/geopandas.py", line 133, in _resolve_single_file
    raise NotImplementedError(
NotImplementedError: Opening multiple files is not supported by this driver

However, this seems to work:

      urlpath: 'simplecache::zip://::https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/{{res}}m/cultural/ne_{{res}}m_admin_0_countries.zip'

martindurant commented 3 years ago

Yeah, OK: zip should pick the first available file if not given, but will only extract one file in a URL like this. Note to self: to see how hard it would be to make the glob work.

intake / intake_geopandas

Status of reading a zipped shape file and caching #24