pysal / libpysal

Core components of Python Spatial Analysis Library
http://pysal.org/libpysal
Other
259 stars 78 forks source link

Add support for `DataArray` objects read through `rioxarray` #445

Open darribas opened 2 years ago

darribas commented 2 years ago

The raster functionality in for weights building currently supports DataArray objects created through xarray.open_rasterio:

import xarray
from libpysal.weights import Queen
pop = xarray.open_rasterio(
               'https://geographicdata.science/book/_downloads/5263090bd0bdbd7d1635505ff7d36d04/ghsl_sao_paulo.tif'
)
w = Queen.from_xarray(pop)

The code above correctly picks up that missing data are specified with -200 and removes them from the calculation of the weights:

w.n
> 97232

This is not the case when we use DataArray objects build through the (generally recommended) rioxarray.open_rasterio:

import rioxarray
pop2  = rioxarray.open_rasterio(
               'https://geographicdata.science/book/_downloads/5263090bd0bdbd7d1635505ff7d36d04/ghsl_sao_paulo.tif'
)
w2 = Queen.from_xarray(pop2)

which does not pick up the missing data:

w.n
> 194688

My hunch is that this is because the builder picks up missing values only through the attrs object:

https://github.com/pysal/libpysal/blob/0f03a313896de6af096ff92105e54da86bb78368/libpysal/weights/raster.py#L229-L233

But rioxarray does not populate that general object, instead logs missing data under pop2.rio.nodata:

pop.attrs
> {'transform': (250.0, 0.0, -4482000.0, 0.0, -250.0, -2822000.0),
 'crs': '+proj=moll +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs=True',
 'res': (250.0, 250.0),
 'is_tiled': 0,
 'nodatavals': (-200.0,),
 'scales': (1.0,),
 'offsets': (0.0,),
 'AREA_OR_POINT': 'Area',
 'grid_mapping': 'spatial_ref'}

pop2.attrs
> {'_FillValue': -200.0, 'scale_factor': 1.0, 'add_offset': 0.0}

pop2.rio.nodata
> -200.0

I think we should support both formats, read through vanilla xarray and with rioxarray loaded in the session too. The former is a more generic approach, the latter is a more robust one for rasters.

Not sure what'd be the ideal way of supporting this. I'm cc'ing here @MgeeeeK since he worked on the original implementation and might have some ideas, and @snowman2 as the main driver behind rioxarray in case they have some suggestions in terms of best practices.

snowman2 commented 2 years ago

This is a helpful reference: https://corteva.github.io/rioxarray/stable/getting_started/nodata_management.html

darribas commented 2 years ago

I've issued a PR (#455 ) that I think should have a more comprehensive approach following @snowman2 suggestion from rioxarray. If tests pass, I think it should be good to merge.