isciences / exactextract

Fast and accurate raster zonal statistics
Apache License 2.0
258 stars 33 forks source link

`int32` raster causes exception #103

Closed BielStela closed 5 months ago

BielStela commented 5 months ago

Hello!

I have two identical rasters except for dtype: one is int32 and the other float32. Computing a sum on the int32 causes the following RuntimeError:

    prepare_operations([stat], values, weights)
RuntimeError: Unable to cast Python instance of type <class 'float'> to C++ type '?' (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for details)

I casted the raster to float32 and it fixes the crash. This is the gdal metadata of the tiffs:

===SICK===
{'driver': 'GTiff', 'dtype': 'int32', 'nodata': -200.0, 'width': 15657, 'height': 13426, 'count': 1, 'crs': CRS.from_epsg(4326), 'transform': Affine(0.00227456, 0.0, -79.61249969999994,
       0.0, -0.00227456, 10.059076038000043), 'blockysize': 1, 'tiled': False, 'compress': 'deflate', 'interleave': 'band'}
===OK===
{'driver': 'GTiff', 'dtype': 'float32', 'nodata': -200.0, 'width': 15657, 'height': 13426, 'count': 1, 'crs': CRS.from_epsg(4326), 'transform': Affine(0.00227456, 0.0, -79.61249969999994,
       0.0, -0.00227456, 10.059076038000043), 'blockysize': 1, 'tiled': False, 'compress': 'deflate', 'interleave': 'band'}

As you can see the only difference is the dtype.

I also casted other uint8 to int32, and the same exception occurred so I suspect that it is only an issue for int32 dtype

Thanks a lot!

Update: casting to uint32 also fixes it but int64 fails with the same error message Update 2: I'm using the python bindings version 0.2.0.dev0 :sweat_smile:

BielStela commented 5 months ago

Update 3: I tried the same raster with the cli and it works well so the issue must come from the pybind11 layer

dbaston commented 5 months ago

Can you share the full call to exact_extract ? I'm having trouble replicating this.

BielStela commented 5 months ago

Hello, sure! I managed to reproduce it with a minimal example:

import json

import rasterio as rio
import numpy as np
from exactextract import exact_extract, __version__
print("exact_exctract version: ", __version__)

GEOJSON = {
    "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "properties": {},
            "geometry": {
                "type": "Polygon",
                "coordinates": [
                    [
                        [3.0, 7.0],
                        [3.0, 10.0],
                        [0.0, 10.0],
                        [0.0, 7.0],
                        [3.0, 7.0],
                    ],
                ],
            },
        }
    ],
}
with open("test_zone.json", "w") as out:
    json.dump(GEOJSON, out)

data = np.array([[0, 1, 0], [1, 9, 1], [0, 1, 0]], dtype=np.int32)
transform = rio.transform.from_origin(0, 10, 1, 1)
with rio.open(
        "sick_raster.tif",
        "w",
        driver="GTiff",
        width=data.shape[1],
        height=data.shape[0],
        count=1,
        dtype="int32",
        crs="+proj=latlong",
        transform=transform,
        nodata=-1
) as dst:
    dst.write(data, 1)

exact_extract("sick_raster.tif", "test_zone.json", ops=['sum'])

And found out also that is the presence of nodata that triggers the error. If nodata=None it works well, for any other value it crashes.

BielStela commented 5 months ago

Also it only happens for all the signed integers. the unsigned variants uint8,16,32,64 work well with a nodata value set

dbaston commented 5 months ago

Thanks for the nice report!

BielStela commented 5 months ago

Thank you for the quick fix!