opendatacube / datacube-core

Open Data Cube analyses continental scale Earth Observation data through time
http://www.opendatacube.org
Apache License 2.0
504 stars 176 forks source link

Datacube breaks geopandas s3 reading #1186

Open MatthewJA opened 3 years ago

MatthewJA commented 3 years ago

Expected behaviour

>>> import datacube, geopandas as gpd
>>> shp = "s3://dea-public-data/projects/WaterBodies/moree-test/AusWaterBodies_Moree.shp"
>>> gpd.read_file(shp)
[...a shapefile...]

Actual behaviour

>>> import datacube, geopandas as gpd
>>> shp = "s3://dea-public-data/projects/WaterBodies/moree-test/AusWaterBodies_Moree.shp"
>>> gpd.read_file(shp)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/env/lib/python3.6/site-packages/geopandas/io/file.py", line 129, in _read_file
    req = _urlopen(filename)
  File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 526, in open
    response = self._open(req, data)
  File "/usr/lib/python3.6/urllib/request.py", line 549, in _open
    'unknown_open', req)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 1395, in unknown_open
    raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: s3>

Note that without the datacube import, this code works fine.

Environment information

1.8.4.dev81+g80d466a2 on Sandbox

Kirill888 commented 3 years ago

My guess it's because of this:

https://github.com/opendatacube/datacube-core/blob/3a49f78ead159da505cd78803d6710e7762b3a7e/datacube/utils/uris.py#L219-L238

@MatthewJA note that importing geopandas BEFORE importing datacube would allow you to keep using .read_file, so that's a work around. But it seems like an error in geopandas to be honest. It somehow decides to use normal http for s3 access

Kirill888 commented 3 years ago

dodgy fix when import order can not be controlled is to do this before attempting to read s3 resources:

gpd.io.file._VALID_URLS.discard("s3")
Kirill888 commented 3 years ago

https://github.com/geopandas/geopandas/issues/2068

MatthewJA commented 3 years ago

Thanks for following this up with Geopandas!

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

robbibt commented 2 years ago

In a strange co-incidence, I just encountered this problem today, and can verify that it is still an issue.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

SpacemanPaul commented 2 years ago

This issue is blocked on the geopandas side.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.