perrygeo / python-rasterstats

Summary statistics of geospatial raster datasets based on vector geometries.
BSD 3-Clause "New" or "Revised" License
528 stars 117 forks source link

handling remote files (or setting rasterio/gdal context) #208

Open knaaptime opened 4 years ago

knaaptime commented 4 years ago

Hi, thanks for this great package. I'm wondering if it's possible to pass an open raster (as opposed to simply the path) to zonalstats (without converting to ndarray first).

My use case is that I'm reading a raster file from a public s3 bucket, which requires the following rasterio context:

with rasterio.env.Env(aws_unsigned=True):
    r = rasterio.open("s3://path-to-raster.tif)

which works great by itself. But now I need to calculate zonal stats and I cannot make rs.zonal_stats use the rasterio environment. Would it be possible to modify the rs.zonal_stats so that it accepts an open rasterio.DatasetReader? (or allow passing the context)

e.g.

with rasterio.env.Env(aws_unsigned=True):
    open_raster = rasterio.open("s3://path-to-raster)
    zonal_gjson = rs.zonal_stats(
                    geodataframe, open_raster, prefix="Type_", geojson_out=True, categorical=True
                )
sdtaylor commented 3 years ago

I've been doing s3 stuff with rasterio and it handles an s3 path without the need for a context mananger, as long as it's a free bucket. rasterstats doesn't change the path at all so it seems to work fine without any adjustment

import rasterstats
from shapely.geometry import Polygon

s3_path = 's3://landsat-pds/L8/139/045/LC81390452014295LGN00/LC81390452014295LGN00_B1.TIF'
p=Polygon([(465764.5, 2428342.0), (465104.1, 2404237.9), (492840.3, 2402586.9)])
rasterstats.zonal_stats(p, s3_path)
Out[4]: [{'min': 10150.0, 'max': 13163.0, 'mean': 10490.125291650986, 'count': 372020}]

It can be wrapped in a special rasterio session for a requester pays bucket too. Like with this naip imagery.

from rasterio.session import AWSSession
import rasterio
import rasterstats
from shapely.geometry import Polygon

p=Polygon([(525794.7, 4876393.3), (525711.3, 4875371.7),(526910.1, 4875413.4)])
s3_path = 's3://naip-visualization/wy/2019/60cm/rgb/44104/m_4410459_se_13_060_20190828.tif'

session = AWSSession(requester_pays=True)
with rasterio.Env(session):
        stats = rasterstats.zonal_stats(p, s3_path)

stats
[{'min': 59.0, 'max': 220.0, 'mean': 118.68374314241665, 'count': 1696115}]

rasterstats only reads with the rasterio window method, so as long as the s3 files are cloud optimized geotiffs then it will not download the full raster.

dennisgsmith commented 2 years ago

For anyone looking to do this in a mock S3 tool like minio, I'll save you a google search:

https://github.com/rasterio/rasterio/issues/1293#issuecomment-502651617

Just add the following variables to your rasterio.Env:

import rasterstats
import rasterio
from shapely.geometry import Polygon

s3_path = 's3://landsat-pds/L8/139/045/LC81390452014295LGN00/LC81390452014295LGN00_B1.TIF'
p = Polygon([(465764.5, 2428342.0), (465104.1, 2404237.9), (492840.3, 2402586.9)])

with rasterio.Env(AWS_HTTPS='NO', GDAL_DISABLE_READDIR_ON_OPEN='YES', AWS_VIRTUAL_HOSTING=False, AWS_S3_ENDPOINT='localhost:9000'):
    stats = rasterstats.zonal_stats(p, s3_path)