isciences / exactextract

Fast and accurate raster zonal statistics
Apache License 2.0
246 stars 32 forks source link

Minority, majority, and variety don't work with a thematic raster #28

Closed alpha-beta-soup closed 2 years ago

alpha-beta-soup commented 2 years ago

I have a thematic (discrete, integer) raster and want to use the minority and majority statistics. However the output is always nan. If I run the same commands with mean, min, max, I get sensible numerical output (except that only a majority output would be sensible in my use case).

exactextract \
-r luc:./nzfarm.raster.tif \
-p ./parcels.gpkg[parcels] \
-f id \
-s majority(luc) \
-o ./parcels.majority.nzfarm.raster.csv

head ./parcels.majority.nzfarm.raster.csv
id,luc_majority
"2734",nan
"2743",nan
"3258",nan
"3268",nan
"3422",nan
"3606",nan
"3695",nan
"4381",nan
"4382",nan

Same command's output but for max instead of majority:

id,luc_max
"2734",14
"2743",3
"3258",14
"3268",3
"3422",26
"3606",14
"3695",14
"4381",14
"4382",14

min

id,luc_min
"2734",3
"2743",3
"3258",14
"3268",3
"3422",26
"3606",3
"3695",3
"4381",14
"4382",14

...and for mean (nonsense output for my use-case, but demonstrating that it works):

id,luc_mean
"2734",13.0842542648315
"2743",3
"3258",14
"3268",3
"3422",26.0000019073486
"3606",13.4197244644165
"3695",3.33548903465271
"4381",14
"4382",14

variety doesn't seem to work either, but reports 0 rather than NaN:

id,luc_variety
"2734",0
"2743",0
"3258",0
"3268",0
"3422",0
"3606",0
"3695",0
"4381",0
"4382",0

count:

id,luc_count
"2734",0.424698650836945
"2743",0.0814584046602249
"3258",0.487965881824493
"3268",0.00538108451291919
"3422",0.00977452844381332
"3606",0.311928182840347
"3695",0.559225618839264
"4381",0.288064271211624
"4382",0.640185415744781

There is no nan data where the input vector information overlaps the raster (though the raster does have nan data in other places). In the full set, most of my vector features are smaller than the raster cell size, but very many are much larger. The output is consistent regardless of vector feature size.

I don't know whether this is relevant, but both my raster and vector are both projected in EPSG:3851, which happens to cross the antimeridian.

I note that I can't see a test for majority statistics in https://github.com/isciences/exactextract/blob/master/test/test_stats.cpp only mode and minority. I suppose mode is an alias for majority?

(Also, in the readme I think you've reversed majority/minority with respect to the sample use case. i.e. a "majority" should give you the most common land cover type, but your table says "least common".)

Here's a screenshot of the sample area. The vector features are polygons, I have just not drawn them with a fill. They may overlap. Screenshot from 2022-04-06 15-32-54

dbaston commented 2 years ago

Thanks for the report.