isciences / exactextract

Fast and accurate raster zonal statistics
Apache License 2.0
246 stars 32 forks source link

Lots of NANs #24

Closed chapmanjacobd closed 3 years ago

chapmanjacobd commented 3 years ago

How are nulls handled exactly? Most of my values are nan for pop_weighted_mean even though there is population and data there

NAME,pop_sum,variable_min,variable_max,pop_weighted_mean
Afghanistan,9753520,1,40,4.44056987762451
Angola,9689003,1,72,nan
Albania,1102135.125,1,689,nan
United Arab Emirates,2645905.5,1,488,nan
Argentina,45506.94921875,1,21,nan
Armenia,1076041.25,1,67,nan

Reproducible example:

wget https://raw.githubusercontent.com/chapmanjacobd/rasters/main/ne_110m_countries.gpkg
wget https://raw.githubusercontent.com/chapmanjacobd/rasters/main/pop.tif
wget https://raw.githubusercontent.com/chapmanjacobd/rasters/main/osm/walkable.tif.gz
gzip -d ./walkable.tif.gz

exactextract -r pop:pop.tif \
  -r variable:walkable.tif \
  -p ne_110m_countries.gpkg \
  -f NAME \
  -s "sum(pop)" \
  -s "sum(variable)" \
  -s "max(variable)" \
  -s "mean(variable)" \
  -s "pop_weighted_mean=weighted_mean(variable,pop)" \
  -o countries_walkable.csv

cat countries_walkable.csv

https://github.com/chapmanjacobd/rasters/blob/main/osm/README.md

chapmanjacobd commented 3 years ago

unsetting the nodata from both my rasters

gdal_edit.py -unsetnodata pop.tif
gdal_edit.py -unsetnodata walkable.tif

seemed to fix it

~/g/x/temp # cat countries_walkable.csv
NAME,pop_sum,variable_sum,variable_max,variable_mean,pop_weighted_mean
Afghanistan,9753520,943.059143066406,40,0.00362812471576035,0.269284546375275
Angola,9689003,1674.08544921875,72,0.00467964634299278,1.54831600189209
Albania,1102135.125,10026.787109375,689,0.6992227435112,37.0944442749023
United Arab Emirates,2645905.5,9642.537109375,488,0.36660099029541,5.38662815093994
Argentina,45506.94921875,237.93376159668,21,0.0123930675908923,2.90612316131592
Armenia,1076041.25,1425.55187988281,67,0.105954967439175,3.7758355140686
Antarctica,,,,,
Fr. S. Antarctic Lands,0,280.502258300781,32,0.0376532599329948,-nan
Australia,218187.109375,8278.0595703125,435,0.252639383077621,15.1573438644409

It would be nice if a setting for this was externalized in the CLI program (similar to this https://github.com/isciences/exactextractr/issues/5 ) but whatevs

I'm going to update my data so the example in the OP will work correctly for other people