Open fgianoli opened 3 years ago
Thanks for raising this issue - I had similar thoughts. Some statistics are currently disabled due to possible high memory consumption. I had a look at the code multiple times and saw this comment:
// only calculate cheap stats-- we cannot calculate stats which require holding values in memory -- because otherwise we'll end
// up trying to store EVERY pixel value from the input in memory
@nyalldawson I was thinking about adding a parameter to select statistic methods. If 'expensive' statistics are selected by the user we could display a warning on high memory consumption in the processing log.
Months ago I also tried to rework QgsStatisticalSummary
to calculate the statistics based on data streams rather than storing it - unfortunately we can't do that for all statistics due to the variety
statistic which needs all values to be stored.
Maybe we can add the possibility to compute some others stats.
I have checked the code and added median, majority and standard deviation (I don't know if the code works).
I would guess that https://en.wikipedia.org/wiki/Median_of_medians could be used to provide an approximate soluion (I would guess that its rare to need an exact median for rasters...)
@roya0045 I already implemented all the statistics as exact streaming stats methods except the variety stat (never opened a PR though) 😅 . There the count-distinct problem keeps us from making QgsStatisticalSummary
memory independent. You have to keep a list of unique values to compute the exact variety. In addition, QgsStatisticalSummary
is used in other QGIS features too, where users expect exact result values. IMHO, finding a solution to this problem would have a great impact on memory efficiency of this class.
@fgianoli your code looks good (go just a quick view) - does it compile and run?
@root676 what about using a Reservoir sampling approach for estimating those indicator and have the method as distinct from the full ones?
@root676 No I didn't because now I am on windows and I have no idea how to compile QGIS here.
@roya0045 thanks for sending this link - I think these enhancements to QgsStatisticalSummary
should get a broader discussion. Maybe I'll open a QEP for that when I have time.
@root676 I'm glad I could help!
Add "median" to Raster Layer Zonal Statiscs process
At the moment the Algorithm "Raster Layer Zonal Statiscs" computes a lot of different statistics but not the median or the mode.
Could be very useful to add these two others methods to the output