qgis / QGIS

QGIS is a free, open source, cross platform (lin/win/mac) geographical information system (GIS)
https://qgis.org
GNU General Public License v2.0
10.4k stars 2.98k forks source link

Add "median" to Raster Layer Zonal Statiscs outputs #40655

Open fgianoli opened 3 years ago

fgianoli commented 3 years ago

Add "median" to Raster Layer Zonal Statiscs process

At the moment the Algorithm "Raster Layer Zonal Statiscs" computes a lot of different statistics but not the median or the mode.

Could be very useful to add these two others methods to the output

root676 commented 3 years ago

Thanks for raising this issue - I had similar thoughts. Some statistics are currently disabled due to possible high memory consumption. I had a look at the code multiple times and saw this comment:

// only calculate cheap stats-- we cannot calculate stats which require holding values in memory -- because otherwise we'll end
// up trying to store EVERY pixel value from the input in memory

@nyalldawson I was thinking about adding a parameter to select statistic methods. If 'expensive' statistics are selected by the user we could display a warning on high memory consumption in the processing log.

Months ago I also tried to rework QgsStatisticalSummary to calculate the statistics based on data streams rather than storing it - unfortunately we can't do that for all statistics due to the variety statistic which needs all values to be stored.

fgianoli commented 3 years ago

Maybe we can add the possibility to compute some others stats.

I have checked the code and added median, majority and standard deviation (I don't know if the code works).

qgsalgorithmrasterzonalstats.txt

roya0045 commented 3 years ago

I would guess that https://en.wikipedia.org/wiki/Median_of_medians could be used to provide an approximate soluion (I would guess that its rare to need an exact median for rasters...)

root676 commented 3 years ago

@roya0045 I already implemented all the statistics as exact streaming stats methods except the variety stat (never opened a PR though) 😅 . There the count-distinct problem keeps us from making QgsStatisticalSummary memory independent. You have to keep a list of unique values to compute the exact variety. In addition, QgsStatisticalSummary is used in other QGIS features too, where users expect exact result values. IMHO, finding a solution to this problem would have a great impact on memory efficiency of this class.

root676 commented 3 years ago

@fgianoli your code looks good (go just a quick view) - does it compile and run?

roya0045 commented 3 years ago

@root676 what about using a Reservoir sampling approach for estimating those indicator and have the method as distinct from the full ones?

fgianoli commented 3 years ago

@root676 No I didn't because now I am on windows and I have no idea how to compile QGIS here.

root676 commented 3 years ago

@roya0045 thanks for sending this link - I think these enhancements to QgsStatisticalSummary should get a broader discussion. Maybe I'll open a QEP for that when I have time.

roya0045 commented 3 years ago

@root676 I'm glad I could help!