informatics-lab / covid19-examples-and-docs

Documentation and examples for the Met Office Informatics Lab COVID-19 response platform and data.
1 stars 3 forks source link

reading shapefiles from cloud #20

Closed aaronspring closed 4 years ago

aaronspring commented 4 years ago
aaronspring commented 4 years ago

trying to fix in https://github.com/informatics-lab/covid19-examples-and-docs/pull/18

aaronspring commented 4 years ago

@tam203 I could reproduce your error. where did you get the shapefiles from? any idea how to read them into geopandas?

tam203 commented 4 years ago

Hi @aaronspring .

So the data I used is on the public data set we've pushed up. Here is the (awful) index page.

here is the data file I used: global_daily_precip_max_20200106.nc

and here the shape file Counties_and_Unitary_Authorities_April_2019_Boundaries_EW_BUC.shp

aaronspring commented 4 years ago

I found these. I am looking for a nice way how to read them into geopandas.

normally I would just download them manually off the internet. when I did that, I got reasonable column names. what I search for here is a way to download them at least locally into the notebook environment (at runtime, not via git) and open them.

aaronspring commented 4 years ago

or where did you download these shapefiles from? are those internal use only or did you get them from another resources off the internet?

aaronspring commented 4 years ago

what I tried:

url='https://metdatasa.blob.core.windows.net/covid19-response/shapefiles/England/Counties_and_Unitary_Authorities_April_2019_Boundaries_EW_BUC.shp'
file_data = BytesIO(BlobClient.from_blob_url(url).download_blob().readall())
geopandas.read_file(url)
---------------------------------------------------------------------------
CPLE_OpenFailedError                      Traceback (most recent call last)
fiona/_shim.pyx in fiona._shim.gdal_open_vector()

fiona/_err.pyx in fiona._err.exc_wrap_pointer()

CPLE_OpenFailedError: '/vsimem/f6ede516e91146f38d179a39f4036dcc' not recognized as a supported file format.

During handling of the above exception, another exception occurred:

DriverError                               Traceback (most recent call last)
<ipython-input-14-f6bf670bb6cb> in <module>
----> 1 geopandas.read_file(url)

/srv/conda/envs/notebook/lib/python3.7/site-packages/geopandas/io/file.py in read_file(filename, bbox, mask, rows, **kwargs)
     87 
     88     with fiona_env():
---> 89         with reader(path_or_bytes, **kwargs) as features:
     90 
     91             # In a future Fiona release the crs attribute of features will

/srv/conda/envs/notebook/lib/python3.7/site-packages/fiona/collection.py in __init__(self, bytesbuf, **kwds)
    537         # Instantiate the parent class.
    538         super(BytesCollection, self).__init__(self.virtual_file, vsi=filetype,
--> 539                                               encoding='utf-8', **kwds)
    540 
    541     def close(self):

/srv/conda/envs/notebook/lib/python3.7/site-packages/fiona/collection.py in __init__(self, path, mode, driver, schema, crs, encoding, layer, vsi, archive, enabled_drivers, crs_wkt, ignore_fields, ignore_geometry, **kwargs)
    152             if self.mode == 'r':
    153                 self.session = Session()
--> 154                 self.session.start(self, **kwargs)
    155             elif self.mode in ('a', 'w'):
    156                 self.session = WritingSession()

fiona/ogrext.pyx in fiona.ogrext.Session.start()

fiona/_shim.pyx in fiona._shim.gdal_open_vector()

DriverError: '/vsimem/f6ede516e91146f38d179a39f4036dcc' not recognized as a supported file format.
aaronspring commented 4 years ago

ok there seems to be no bug. before I just downloaded a different shapefile named shapefile = '/Users/aaron.spring/Downloads/UK_covid_reporting_regions.shp', dontknow from where.

while the wget extension is still not very nice, at least it works.

tam203 commented 4 years ago

@aaronspring just seen the above. I'll digest but feel free to ignore this.

I can't remember the original source. @kaedonkers might know.

We've uploaded these and other shape files to an Azure blob container, as I say this is the index page the README for the data set is here

I don't think that this is what you are asking @aaronspring but this notebook has some examples of working with things in the blob store (but you can also just user urllib or whatever) -

kaedonkers commented 4 years ago

@aaronspring The UK shapefile is a manually curated one for all the COVID reporting regions in the UK. Here is a link to download it from our provision on Azure. The other country shapefiles are from https://gadm.org/download_country_v3.html

Does that answer the questions you were asking?

tam203 commented 4 years ago

@aaronspring I'm not sure I'm providing the clarity to help you help us. Shall we hop on a call to discuss? If you drop an email to covid19@informaticslab.co.uk we can arrange something.

The work you done looks really exciting I'm just want to make sure we can get the most out of it.

aaronspring commented 4 years ago

@aaronspring The UK shapefile is a manually curated one for all the COVID reporting regions in the UK. Here is a link to download it from our provision on Azure. The other country shapefiles are from https://gadm.org/download_country_v3.html

Does that answer the questions you were asking?

I am looking for the source of https://metdatasa.blob.core.windows.net/covid19-response/shapefiles/UK/UK_covid_reporting_regions.shp or a way to download these files into the binder local environment, because geopandas needs to open .shp,.shx and .dbf at the same time to read in a shapefile. the current way is quite manual and I hope to find a way to get opening a shapefile from azure blob into a clean function.

aaronspring commented 4 years ago

found a quick solution with intermediate files:


base = 'https://metdatasa.blob.core.windows.net/covid19-response/shapefiles/England'
name = 'Counties_and_Unitary_Authorities_April_2019_Boundaries_EW_BUC'
for ending in ['shx','dbf','shp']:
    filename = f"{name}.{ending}"
    url = f'{base}/{filename}'
    with open(filename, "wb") as f:
        print(f'Download {url} to {filename}')
        data = BlobClient.from_blob_url(url).download_blob()
        data.readinto(f)
aaronspring commented 4 years ago

now working on a way how to get more files from blob into xarray