isciences / exactextract

Fast and accurate raster zonal statistics
Apache License 2.0
260 stars 33 forks source link

CRS comparison fails due to WKT format differences when using rioxarray raster and geopandas vector (WKT1 vs. WKT2) #159

Open AlexDo1 opened 3 hours ago

AlexDo1 commented 3 hours ago

Hello and thank you for the amazing package that computes (exact) zonal statistics in a fast and reliable way. I am using Python with geopandas and xarray/rioxarray together with exactextract and have found an issue in your code.

The CRS comparison: https://github.com/isciences/exactextract/blob/cd047bb12c073b640332dc8582c0f0c72e173e0b/python/src/exactextract/exact_extract.py#L298-L323 relies on comparing the WKT representation of the CRS. However, geopandas and rioxarray handle WKT strings differently:

This behavior can be reproduced with the following example:

import geopandas as gpd
import rioxarray
import xarray as xr
import numpy as np
from pyproj import CRS
from exactextract.exact_extract import crs_matches
from exactextract.feature import GeoPandasFeatureSource
from exactextract.raster import XArrayRasterSource

# Define the CRS as EPSG:4326
crs = CRS.from_epsg(4326)

# Create a simple GeoDataFrame
gdf = gpd.GeoDataFrame(geometry=[], crs=crs)

# Create a simple xarray Dataset
data = np.random.rand(5, 5)
lat = np.linspace(-90, 90, 5)  # y-dimension (latitude)
lon = np.linspace(-180, 180, 5)  # x-dimension (longitude)

# Create the Dataset
ds = xr.Dataset(
    {
        "band1": (["y", "x"], data)
    },
    coords={
        "y": lat,
        "x": lon,
    },
)
ds = ds.rio.write_crs(crs)  # Write the CRS to the dataset

# Print WKT representations
print("GeoDataFrame CRS (WKT):")
print(gdf.crs.to_wkt())  # WKT2
# GEOGCRS["WGS 84",ENSEMBLE["World Geodetic System 1984 ensemble",MEMBER["World Geodetic System 1984 (Transit)"],MEMBER["World Geodetic System 1984 (G730)"],MEMBER["World Geodetic System 1984 (G873)"],MEMBER["World Geodetic System 1984 (G1150)"],MEMBER["World Geodetic System 1984 (G1674)"],MEMBER["World Geodetic System 1984 (G1762)"],MEMBER["World Geodetic System 1984 (G2139)"],ELLIPSOID["WGS 84",6378137,298.257223563,LENGTHUNIT["metre",1]],ENSEMBLEACCURACY[2.0]],PRIMEM["Greenwich",0,ANGLEUNIT["degree",0.0174532925199433]],CS[ellipsoidal,2],AXIS["geodetic latitude (Lat)",north,ORDER[1],ANGLEUNIT["degree",0.0174532925199433]],AXIS["geodetic longitude (Lon)",east,ORDER[2],ANGLEUNIT["degree",0.0174532925199433]],USAGE[SCOPE["Horizontal component of 3D system."],AREA["World."],BBOX[-90,-180,90,180]],ID["EPSG",4326]]

print("\nDataset CRS (WKT):")
print(ds.rio.crs.to_wkt())  # WKT1
# GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]

# Attempt to compare WKT strings
if gdf.crs.to_wkt() == ds.rio.crs.to_wkt():
    print("\nCRS match.")
else:
    print("\nCRS mismatch.")

# Attempt to compare CRS objects as exactextract does
if not crs_matches(GeoPandasFeatureSource(gdf), XArrayRasterSource(ds)):
    print("\nCRS mismatch (exactextract).")
else:
    print("\nCRS match (exactextract).")

So even though the crs of the GeoDataFrame and the Dataset were created from the same pyproj.CRS object, exactextracts gives a warning that the is a CRS mismatch.

As a possible solution I would propose to just compare EPSG codes instead of WKT representations.

dbaston commented 3 hours ago

If you have the GDAL Python bindings available, it will use those to check CRS equality. I guess another fallback could be added to use pyproj to do the same.