Closed scottyhq closed 4 years ago
Thats most likely because the standard libraries http.server.SimpleHTTPRequestHandler
doesn't support HTTP Range Requests which of course are extremely useful if you like to access only a subset of a remote file. There are two possible solutions: teach the server to handle range-requests or teach rasterio not to require them. I think both options are reasonable.
There are a couple of implementations for range requests on pythons SimpleHttpServer
e.g in this gist or as a papckage. I did not test any of the two but had a quick look over the source, the implementation seems to be reasonable for a simple test-server. One thing which is missing seems to be the support for multipart ranges which would be a reasonable choice for requesting twodimensional subsets but I don't know if rasterio makes use of this feature.
On the other hand, maybe rasterio should fall back to use non-range-requests if the server doesn't support them. While this would potentially result in larger transfer sizes, that may still be better than failing for such servers.
Thanks for the info @d70-t . It looks like the test code in fsspec that @martindurant linked to also addesses the Range Request support (https://github.com/intake/filesystem_spec/blob/master/fsspec/implementations/tests/test_http.py#L39).
Ultimately, rasterio is using GDAL behind the scenes for reading and many environment variables exist to control behavior. https://gdal.org/user/virtual_file_systems.html, https://trac.osgeo.org/gdal/wiki/ConfigOptions, more info here https://github.com/OSGeo/gdal/issues/1295.
As an aside, it looks like CI is not running on this PR. If it appeals I could add the GitHub Actions setup we're using in Intake-stac to run pytest on linux? https://github.com/intake/intake-stac/blob/5d3af3abffcfbe409361da7e105aa69838cef2ff/.github/workflows/main.yaml
Happy to have Actions run the tests, I have no idea why Travis has ceased. I tried to reactivate it now, and it says "There was an error while trying to activate the repository." (very useful).
Note that can_be_local
is in fsspec/master - it has not been released yet. Do you need a release to continue here?
That's ok, just going off the 37-upstream environment: https://github.com/intake/intake-xarray/pull/82/checks?check_run_id=1233000515 .
Not sure why test_discover[netcdf][zarr] are failing with E KeyError: 'datashape'
... those pass for me locally
I'm about to add a fix for allowing Range requests on intake_xarray/tests/test_remote.py::test_read_remote_tif
datashape
has been discontinued from Intake, because it was never used for anything - just remove it from the test.
OK! @martindurant - I couldn't figure out the context-manager approach for this data server setup (tests kept hanging and/or failing, seems like the server wasn't staying open or wasn't handling requests)
but I combined the fixture already in place for testing intake-server with @d70-t's suggestion of the RangeHTTPServer package on pypi. So the upstream test is now working!
Any additional changes and/or tests needed? One thing I can think of is to ensure the memory footprint is less than the local file size to verify lazy loading / just reading metadata.
I'd prefer to do any necessary changes to netcdf.py in a separate PR.
After merging this it would be great to release intake-xarray 0.3.3 and fsspec 0.8.4 since the raster.py now uses can_be_local
and setup.py has fsspec>0.8.3
verify lazy loading / just reading metadata
If you can think of a way, sure. I'm not sure how... You could read a file which has no real data and would error?
fsspec 0.8.4
Be sure to ping me early next week
Be sure to ping me early next week
Ping! A release would be great :)
I'll think about adding one additional test and making sure CI passes for this PR. I'm wondering if it's worth modifying the existing CI environments too? Specifically, instead of '36-build, 36-defaults-build, 37-nodefaults-build' how about:
- py36
- py37
- py38
- upstream
with all dependencies unpinned?
^ yes totally agree with that test matrix
fsspec release is on pypi, v0.8.4
thanks @martindurant ! since i'm not a repo maintainer, i can't re-trigger the CI here https://github.com/intake/intake-xarray/actions/runs/303467233. Currently the CI environments (except for upstream) install fsspec from conda-forge, so we'll have to wait for it to become available there.
conda-forge PR is merged, package should be available soon.
@scottyhq , would you like to be a maintainer?
@scottyhq , would you like to be a maintainer?
If you need more hands on deck, feel free to add me, but I can't promise a lot of time commitment. My hands are pretty full as it is ;)
The same for us all
The package is ready and you have rights. You can merge this once things are passing.
OK, I think this is ready to go! Is is simple merge preferred here? or squash and merge?
It would be nice to do a minor release after merging. @martindurant - if you want I can add the pypi publish workflow as well, but that would require you to add pypi username info under the repository secrets: https://github.com/intake/intake-stac/blob/master/.github/workflows/pypipublish.yaml
I can do the release. I think simple merge is fine for this repo.
I can do the release. I think simple merge is fine for this repo.
thanks again @martindurant and @d70-t for the guidance. Once intake-xarray 0.3.3 is released I'll be able to merge https://github.com/intake/intake-stac/pull/61 and get a new version of intake-stac out that takes advantage of these changes.
intake-xarray release merged and building on conda-forge now
I seem to have called it 0.4.0
Working towards a fix for #81.
@martindurant and @d70-t , I'm pretty new to setting up test servers, so feel free to take over this PR. But I felt like I made a bit of progress today and wanted to open this up.
Currently my
test_read_remote_tif
function fails withE rasterio.errors.RasterioIOError: Range downloading not supported by this server!
, which I'm not sure how to deal with. This seems to be a local setup issue because reading from github works fine: