qgis / QGIS

QGIS is a free, open source, cross platform (lin/win/mac) geographical information system (GIS)
https://qgis.org
GNU General Public License v2.0
10.32k stars 2.97k forks source link

Raster calculator does not create/store statistics for output rasters #42835

Closed JohnMaps closed 3 years ago

JohnMaps commented 3 years ago

Apparent bug in QGIS raster calculator: The raster calculator produces incorrect results.

Steps to reproduce: Context - I'm trying to normalise a population density map - e.g. to get the range of values to go between 0 and 255. Step 1 - Import https://data.worldpop.org/GIS/Population_Density/Global_2000_2020_1km/2020/GBR/gbr_pd_2020_1km.tif Step 2 - Note that it has a maximum value of 27500.1 Step 3 - Do literally any calculation. E.g. just do "gbr_pd_2020_1km@1" 1 and it changes the maximum to 21367. Step 4 - As another example, try "gbr_pd_2020_1km@1" / 27500.1. It should give a range up to 1 - but instead it only goes up to 0.8 Step 5 - If you want to try and actually normalise the map (as I want to) try ("gbr_pd_2020_1km@1" 255) / 27500.1 - instead of values up to 255, it only goes to 198.

I can't seem to attach a screenshot, but you can see it here https://gis.stackexchange.com/questions/394114/bug-in-qgis-raster-calculator (Note I linked an incorrect file there - the screenshot refers to the file linked above. But it makes no difference to the point I'm making).

I've also tried doing the same via 'Processing Toolbox' then GDAL Raster Calculator, but it gives the same incorrect results.

Many thanks.

QGIS and OS versions

I've tried this in QGIS 3.16.5 and 3.18 (the latest LTR and beta releases available), and am running it on the latest version of Mac OS (Big Sur 11.2.3).

Fuller details pasted as requested:

QGIS version 3.16.5-Hannover QGIS code revision 58ba7c1ed6
Compiled against Qt 5.14.2 Running against Qt 5.14.2
Compiled against GDAL/OGR 3.2.1 Running against GDAL/OGR 3.2.1
Compiled against GEOS 3.9.1-CAPI-1.14.2 Running against GEOS 3.9.1-CAPI-1.14.2
Compiled against SQLite 3.31.1 Running against SQLite 3.31.1
PostgreSQL Client Version 12.3 SpatiaLite Version 4.3.0a
QWT Version 6.1.4 QScintilla2 Version 2.11.4
Compiled against PROJ 6.3.2 Running against PROJ Rel. 6.3.2, May 1st, 2020
OS Version macOS 10.16
Active python plugins oneclickrasterstacking; HeightmapExport; processing; db_manager; MetaSearch
gioman commented 3 years ago

Step 3 - Do literally any calculation. E.g. just do "gbr_pd_2020_1km@1" * 1 and it changes the maximum to 21367.

@JohnMaps you have to enter raster properties > symbology > min/max values settings and change "accuracy" to "actual" and then click "apply" and you'll see that the values are the same...

JohnMaps commented 3 years ago

Hi - thanks very much. I have changed that, but it still returns the same results. Any further advice appreciated!

agiudiceandrea commented 3 years ago

@JohnMaps could you please try again?

You need to

enter raster properties > symbology > min/max values settings and change "accuracy" to "actual" and then click "apply"

for the raster layer which is the result of the Raster calculator tool.

JohnMaps commented 3 years ago

@agiudiceandrea ah perfect - that works. I was making the change on the original layer. Thanks very much. If only they flagged in the raster calculator the fact that it by default does it only roughly (or at least said so in the manual on the raster calculator). Anyway, thanks very much.

gioman commented 3 years ago

. If only they flagged in the raster calculator the fact that it by default does it only roughly (or at least said so in the manual on the raster calculator)

@JohnMaps fun fact, "they" includes also you. Feel free to submit a patch against QGIS documentation.

elpaso commented 3 years ago

Just a note: this has nothing to do with raster calculator, it is caused by the way the raster renderer calculates min and max values based on a sample.

agiudiceandrea commented 3 years ago

Hi @elpaso I think the issue is also in how the Raster Calculator creates the output raster: it seems it "forgets" to calculate and save new statistics for the output raster layer.

The provided gbr_pd_2020_1km.tif raster GeoTIFF contains the tags with the statistics for that raster, so when opened in QGIS it displays correctly also with the "Estimate (faster)" accuracy statistics option set.

Conversely, the output of the Raster calculator doesn't contain any statistics values, neither as tags stored internally into the GeoTIFF file, nor as PAM .aux.xml sidecar file. When opened in QGIS, the output raster doesn't display correctly with the "Estimate (faster)" accuracy statistics option set and you need to set that option to "Actual (slower)" in order to correctly display the output raster. After that, a PAM .aux.xml sidecar file with the actual statistics is created and the next time you open that raster layer it will display correctly also with the "Estimate (faster)" accuracy statistics option set.

So I think the Raster Calculator (and any other raster creation tool) should have an option to force the creation of the statistics for the output raster.

Moreover, it seems to me a specific tool to create/update the statistics for a raster layer is missing in the processing toolbox (such as a "Calculate statistics" tool), and while in the layer properties there is the possibility to "Compute Histogram" and "Build Pyramids" of the raster layer, an option to also "Calculate/Update Statistics" for that layer is missing (apart from setting the "Actual (slower)" option in the Symbology->Min/Max Value Setting property, but I think it's not very intuitive, as it appears like a display-only option).

Calculate/update the statistics for a raster band in Python is simple:

from osgeo import gdal
ds = gdal.Open(raster_layer_file_path, 1)
ds.GetRasterBand(1).ComputeStatistics(0)
ds = None
elpaso commented 3 years ago

@agiudiceandrea fair point, let's reopen then.

simogeo commented 3 years ago

So I think the Raster Calculator (and any other raster creation tool) should have an option to force the creation of the statistics for the output raster.

I Would also say that it should be enabled by default.

elpaso commented 3 years ago

@simogeo I would say no option and just create the statistics every time: I can't see a use when the statistics are unwanted.

simogeo commented 3 years ago

@elpa :+1:

agiudiceandrea commented 3 years ago

Probably creating the raster statistics is time consuming and the user could choose to only create them at the end of an algorithm/model which applies multiple tools to the raster layer.

simogeo commented 3 years ago

@agiudiceandrea : I suppose that's why the estimation approach exists, indeed.

I have no idea of the amount of time it takes to process stats. Most of the time, it is probably shorter than opening Properties tab and change max stats rendering in dialog box .... By the way, end user is always willing to wait when processing data.

Reliable data / display of data first !

gioman commented 3 years ago

I can't see a use when the statistics are unwanted.

@elpaso @agiudiceandrea agree. But please note that this do not only happen (in QGIS) with rasters created within QGIS, it happens also with layers created in other programs/packages. This leads very often to confusion and it can also lead to serious mistakes if a user overlooks this aspect of QGIS handling min/max values by default. So I think that if we don't want an new option for it we should take the risk and always show the real min/max values.

elpaso commented 3 years ago

@gioman this may actually be doable with the raster calculator because it generates new values and we are able to gather max/min while generating the (whole) output raster without additional I/O.

For many other uses, the default approximated statistics method is of course much faster that retrieving all the values from the source.

gioman commented 3 years ago

For many other uses, the default approximated statistics method is of course much faster that retrieving all the values from the source.

@elpaso yes I understand it. I was just saying that we miss a clear warning (when loading rasters) that the mix/max values that are shown may be not the real ones (see https://github.com/qgis/QGIS/issues/42847) and this leads to real confusion (for which we get often reports/complains) and ultimately can lead to very wrong, real life, decisions in case a user overlooks this detail.