qgis / QGIS

QGIS is a free, open source, cross platform (lin/win/mac) geographical information system (GIS)
https://qgis.org
GNU General Public License v2.0
10.52k stars 2.99k forks source link

Raster Singleband Pseudocolor with Quantile mode inconsistencies #35465

Open PedroVenancio opened 4 years ago

PedroVenancio commented 4 years ago

I'm seeing some inconsistencies in classes breaks, when using Quantile mode on a raster with singleband pseudocolor symbology.

To reproduce:

  1. Download this sample raster and load it on QGIS: https://cld.pt/dl/download/7161caab-5ebe-4e94-8c4a-574a80821a97/perigosidade.tif

  2. Properties -> Symbology -> Render type: Singleband pseudocolor -> Interpolation: Discrete -> Mode: Quantile -> Classes: 5. You will get: imagem

  3. Run GRASS r.quantile from Processing on this raster, with Number of Quantiles: 5. Leave all other options as default. You will get:

    0:20.000000:144.000000
    1:40.000000:720.000000
    2:60.000000:2400.000000
    3:80.000000:4500.000000 

    These are the correct classes breaks, and so the symbology should be: imagem

I have done some tests to be sure about the correct classification.

[A] I've run Raster Pixels to Points to get the center point of all pixels. If you want to check, here is the output geopackage: https://cld.pt/dl/download/3764537c-2504-4326-be77-74e26c0d2a49/perigosidade.gpkg

Load it in QGIS and go to Properties -> Symbology -> Graduated -> Mode: Equal Count (Quantile) -> Classes: 5 -> Classify. You will get: imagem

[B] I've run GRASS r.stats to get all pixel values. Here is the CSV with all values: https://cld.pt/dl/download/b730c614-40b3-40b8-887f-2b985e753d96/perigosidade_r_stats.csv

I've tested this csv in R:

> table_perigosidade <- read.csv("D:\\Testes\\quantile\\perigosidade_r_stats.csv")
> quantile(table_perigosidade$raster_value, probs = c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0))
   0%   10%   20%   30%   40%   50%   60%   70%   80%   90%  100% 
   40    96   144   288   720  1632  2400  3000  4500  6000 46721

[C] Then I extracted the absolute frequencies of raster values:

write.csv(count(table_perigosidade, 'raster_value'), file = "D:\\Testes\\quantile\\perigosidade_r_stats_abs_freq.csv")

https://cld.pt/dl/download/af265a0a-024f-40c4-a04c-77eaa9f09173/perigosidade_r_stats_abs_freq.csv

and did the manual analysis of it, getting the cumulative relative frequencies and the result is the same: https://cld.pt/dl/download/9b014dbd-f991-42e9-880e-baec555a6d18/perigosidade_r_stats_abs_freq_cum.ods

So, it seems that Singleband Pseudocolor Quantile is not returning the correct breaks.

PedroVenancio commented 4 years ago

This issue is not new, I see that it was already present in QGIS 2.18.28.

PedroVenancio commented 4 years ago

Hi @elpaso

Thanks for the intermediate solution, however, I've tested it in master, and it still does not solve the sample I've put here.

Do you agree it is better to keep this issue open until it can be completely solved?

Best regards

elpaso commented 4 years ago

I tested your same dataset and the problem was solved.

elpaso commented 4 years ago

But let me check it again ...

elpaso commented 4 years ago

@PedroVenancio you are right, there was another change required to get exact results, which is to not restrict the sample size to 25000 like we are doing now, I removed that before the commit because I was (wrongly) convinced that the other fix was sufficient to fix this issue. I will investigate if there is a possibility to increase the sample size without affecting performance.

This is what I get if the sample size is augmented ten times (set to 250000*10 )

immagine

elpaso commented 4 years ago

Ok I tested it with my biggest rasters and it obviously freezes QGIS unless a maximum is set. A better approach for exact results would involve a more complex solution with a separate thread.

PedroVenancio commented 4 years ago

This is what I get if the sample size is augmented ten times (set to 250000*10 )

immagine

Hi @elpaso

Yes, this is the right answer for my sample! But if that increase freezes QGIS with huge rasters, it can't be applied.

Quantile for vectors code is newer, and it seems it follows another aproach:

https://github.com/qgis/QGIS/blob/d147a8dffa9e4a143a7ab0016099c3eaabde346b/src/core/classification/qgsclassificationquantile.cpp

Can't be somehow used in rasters? Or the problem is upstream, when collecting the raster statistics?

Thank you very much Alessandro!

elpaso commented 4 years ago

@PedroVenancio we (me and @gioman) decided to go with the current solution (which is to increase both bin count and sample size). It does not impact performances because there is still a limit, even if that is way bigger than it was before (the original code was written 10 years ago, when RAM and CPUs were much more limited than today).

PedroVenancio commented 4 years ago

Perfect @elpaso ! I've tested here and it works!

Do you think this changes can be backported to 3.12 and 3.10?

gioman commented 4 years ago

Do you think this changes can be backported to 3.12 and 3.10?

@PedroVenancio backports done minutes ago.

PedroVenancio commented 4 years ago

@elpaso @gioman

Can you please test quantile in this raster with 3.14?

PERIGOSIDADE.zip

It does not show any class to me, both in Linux or Win.

With QGIS 3.10.7 it works as expected.

PedroVenancio commented 4 years ago

Another issue I'm seeing now, with a Float64 raster:

RISCO.zip

With this raster, quantiles are not correctly calculated (in this case, in QGIS 3.10.7, as it does not work in 3.14).

Rounding that raster and converting to a Int32, it already calculate quantiles correctly (image on the right):

quantile_float64

Using Raster Pixels to Points algorithm to export Float64 pixel values to points, and calculating quantiles in the resulting vector layer, the output is the correct (as seen on the left image).

Running r.quantile in the Float64 raster, the output is correct:

0:20.000000:120.000000
1:40.000000:1320.000000
2:60.000000:2400.000000
3:80.000000:4500.000000 

Also correct exporting to CSV and calculating in R:

> tabela_risco <- read.csv("D:\\RISCO_RASTER_PIXELS_TO_POINTS.csv")
> quantile(tabela_risco$VALUE, probs = c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0))
      0%      10%      20%      30%      40%      50%      60%      70%      80%      90%     100% 
    36.0     72.0    120.0    792.0   1320.0   1920.0   2400.0   3000.0   4500.0   6000.0 218030.4

So, it seems that quantile classification is still failing with some kind of data.

gioman commented 4 years ago

@elpaso @gioman

Can you please test quantile in this raster with 3.14?

@PedroVenancio @elpaso confirmed, new ticket filed here: https://github.com/qgis/QGIS/issues/37448

gioman commented 4 years ago

Another issue I'm seeing now, with a Float64 raster:

@PedroVenancio and https://github.com/qgis/QGIS/issues/37449

PedroVenancio commented 4 years ago

Hi @elpaso and @gioman

It keeps failing with some rasters. See this one:

Raster_Quantiles

Here is the project with the sample raster:

Raster_Quantiles.zip

Quantiles calculated from Vector Symbology are always correct, just like r.quantile output:

0:20.000000:1.500000
1:40.000000:2.625000
2:60.000000:5.250000
3:80.000000:13.500000
elpaso commented 4 years ago

Maybe related to https://github.com/qgis/QGIS/issues/39058

elpaso commented 4 years ago

@PedroVenancio what QGIS version are you testing?

PedroVenancio commented 4 years ago

@PedroVenancio what QGIS version are you testing?

@elpaso I'm testing with

QGIS version 3.15.0-Master QGIS code revision 8d2a0d1ebb
Compiled against Qt 5.11.2 Running against Qt 5.11.2
Compiled against GDAL/OGR 3.2.0dev Running against GDAL/OGR 3.2.0dev
Compiled against GEOS 3.8.1-CAPI-1.13.3 Running against GEOS 3.8.1-CAPI-1.13.3
Compiled against SQLite 3.29.0 Running against SQLite 3.29.0
PostgreSQL Client Version 11.5 SpatiaLite Version 4.3.0
QWT Version 6.1.3 QScintilla2 Version 2.10.8
Compiled against PROJ 7.2.0 Running against PROJ Rel. 7.2.0, November 1st, 2020
OS Version Windows 10 (10.0) This copy of QGIS writes debugging output.

and

QGIS version 3.14.16-Pi QGIS code revision 2b2f966720
Compiled against Qt 5.11.2 Running against Qt 5.11.2
Compiled against GDAL/OGR 3.2.0dev Running against GDAL/OGR 3.2.0dev
Compiled against GEOS 3.8.1-CAPI-1.13.3 Running against GEOS 3.8.1-CAPI-1.13.3
Compiled against SQLite 3.29.0 Running against SQLite 3.29.0
PostgreSQL Client Version 11.5 SpatiaLite Version 4.3.0
QWT Version 6.1.3 QScintilla2 Version 2.10.8
Compiled against PROJ 7.2.0 Running against PROJ Rel. 7.2.0, November 1st, 2020
OS Version Windows 10 (10.0) This copy of QGIS writes debugging output.

and I'm getting the same results.

And I also believe that https://github.com/qgis/QGIS/issues/39058 is somehow related.

elpaso commented 4 years ago

This always the same issue: quantile for rasters is calculated on an histogram, which uses a sample of values and not the whole raster and a limited number of bins.

Unless we re-implement it completely using an exact approach similar to the one used by vector layers and GRASS (and like I've done with unique/paletted) you will always get approximated results, that might lead to a wrong classification in some cases.

Differently from paletted/quantile, where the calculations were already implemented in a separate thread, for singleband/pseudocolr that is not the case, quantile requires a non-trivial implementation of a new class to gather the values and build the histogram.

I'll see what I can do but I cannot make any promise.

PedroVenancio commented 4 years ago

I'll see what I can do but I cannot make any promise.

Great @elpaso , thank you very much for your effort!

elpaso commented 3 years ago

@PedroVenancio can we close this?

PedroVenancio commented 3 years ago

Hi @elpaso

Do you think this needs a complete overhaul of Singleband pseudocolor implementation, right? Yes, I think we can close it, at least until we can think on a way to support that re-implementation.

Do you agree @gioman ?

gioman commented 3 years ago

@PedroVenancio can we close this?

@elpaso @PedroVenancio I re-read this ticket and other related. To me there is still an issue (the one described at the end of this ticket), so I don't see why close it. I agree that raster symbology should be overhauled, but until then this do not seems fully solved. Just my opinion @elpaso feel free to close.

elpaso commented 3 years ago

@PedroVenancio @gioman the way I see it is that in order to get exact results from a raster classification we cannot use samples but read the whole raster.

Due to the size that a typical raster can get we cannot do this in the GUI thread without freezing QGIS, this means to take a different approach and probably split the classification methods to add a list of "exact" (and potentially very slow) methods.

I actually have got an idea and I've started an implementation back in october that exposes a raster layer as a vector layer and allows to use all vector classification methods already implemented for vectors in QGIS for free.

But...

  1. this has to executed in a separate thread
  2. requires more tests and development
  3. the GUI needs to be updated to make place for these new methods and to allow for cancellation

so, the bottom line is that this is quite a big refactoring that would probably classify as a feature request.

I would be more than happy to work on this topic because I've already spent quite a lot of time researching and experimenting on this but I cannot do it without funds and it does not seem fair to me to use all the bugfixing funds for this development.

There is another potentially promising approach to speed up these calculations which is to use the GPU with OpenCL or OpenGL compute shaders but that is also quite a refactoring.

IMHO the route to go is a QEP followed by a grant, a crowdfunding, a generous donor or a combination of these.

gioman commented 3 years ago

IMHO the route to go is a QEP followed by a grant, a crowdfunding, a generous donor or a combination of these.

@elpaso I will be more than happy to chip in. And probably there are others interested. Ping me privately so we can get an idea of the amount that is necessary.