intake / intake_geopandas

An intake plugin for loading datasets with geopandas
BSD 2-Clause "Simplified" License
15 stars 7 forks source link

caching #17

Closed aaronspring closed 3 years ago

aaronspring commented 4 years ago

closes #15 similar to #16

prototyping gist: https://gist.github.com/aaronspring/3a865b991f40fa7b3afef102df0717f8

Background:

given those requirements I decided for the following Caching strategy:

  1. cache with fsspec.open_local requiring same_names=True to save files with their proper names to disk, warn when same_names not given
  2. pass local path to geopandas.read_file, modify local url to zip://...file.zip or choose file.shp
  3. bypass fsspec when no cache:: in url

i.e. I didnt see a way how to cache and use a use_fsspec keyword @ian-r-rose @martindurant

aaronspring commented 4 years ago

test also with https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip extracted here: difficult to cache all: https://github.com/geopandas/geopandas/tree/master/geopandas/datasets/naturalearth_lowres

aaronspring commented 4 years ago

This PR works for me. Do you have test cases that do not work with this implementation? @ian-r-rose

aaronspring commented 4 years ago

I will wait for your PR then.

ian-r-rose commented 4 years ago

I don't think we have to wait for the linked PR, I just think that once it gets through review and is published it will allow us to simplify some of the work here.

In the meantime, if we can ensure that this works for GeoJSON files, as well as files hosted in cloud storage like S3, then I would be happy to get this in sooner, and we can revisit again when https://github.com/geopandas/geopandas/pull/1535 is available.

aaronspring commented 4 years ago

waiting for https://github.com/geopandas/geopandas/pull/1535 to get merged

ian-r-rose commented 4 years ago

@aaronspring Now that the above is merged, want to re-enable those tests and point this repo at the unreleased version? Once a new version of geopandas is released we can point to the published version and cut a release here.

aaronspring commented 4 years ago

with CI #18 we could test independently. works on my machine

ian-r-rose commented 4 years ago

I'm happy with this. Did you want to install from geopandas master until there is a release @aaronspring?

aaronspring commented 4 years ago

Tried this https://stackoverflow.com/questions/16584552/how-to-state-in-requirements-txt-a-direct-github-source

aaronspring commented 4 years ago

I'm happy with this. Did you want to install from geopandas master until there is a release @aaronspring?

I would like to merge this and use geopandas master but don’t succeed.

ian-r-rose commented 4 years ago

I think it's okay to go back to skipping the tests that rely on geopandas master. We can keep an eye on that repo and publish a release here as soon as it is available. For people who need caching, we can point them at unpublished versions in the meantime.

aaronspring commented 4 years ago

I think I will skip tests if geopandas is <=8.0.O

aaronspring commented 3 years ago

dont understand why this fails. it not a testing thing. the whole setup isnt really starting. never worked with conda-verify. any idea @martindurant ?

ian-r-rose commented 3 years ago

All green, thanks @aaronspring!

aaronspring commented 3 years ago

Thanks for cocreating this.