Open PedroVenancio opened 4 years ago
This issue is not new, I see that it was already present in QGIS 2.18.28.
Hi @elpaso
Thanks for the intermediate solution, however, I've tested it in master, and it still does not solve the sample I've put here.
Do you agree it is better to keep this issue open until it can be completely solved?
Best regards
I tested your same dataset and the problem was solved.
But let me check it again ...
@PedroVenancio you are right, there was another change required to get exact results, which is to not restrict the sample size to 25000 like we are doing now, I removed that before the commit because I was (wrongly) convinced that the other fix was sufficient to fix this issue. I will investigate if there is a possibility to increase the sample size without affecting performance.
This is what I get if the sample size is augmented ten times (set to 250000*10 )
Ok I tested it with my biggest rasters and it obviously freezes QGIS unless a maximum is set. A better approach for exact results would involve a more complex solution with a separate thread.
This is what I get if the sample size is augmented ten times (set to 250000*10 )
Hi @elpaso
Yes, this is the right answer for my sample! But if that increase freezes QGIS with huge rasters, it can't be applied.
Quantile for vectors code is newer, and it seems it follows another aproach:
Can't be somehow used in rasters? Or the problem is upstream, when collecting the raster statistics?
Thank you very much Alessandro!
@PedroVenancio we (me and @gioman) decided to go with the current solution (which is to increase both bin count and sample size). It does not impact performances because there is still a limit, even if that is way bigger than it was before (the original code was written 10 years ago, when RAM and CPUs were much more limited than today).
Perfect @elpaso ! I've tested here and it works!
Do you think this changes can be backported to 3.12 and 3.10?
Do you think this changes can be backported to 3.12 and 3.10?
@PedroVenancio backports done minutes ago.
@elpaso @gioman
Can you please test quantile in this raster with 3.14?
It does not show any class to me, both in Linux or Win.
With QGIS 3.10.7 it works as expected.
Another issue I'm seeing now, with a Float64 raster:
With this raster, quantiles are not correctly calculated (in this case, in QGIS 3.10.7, as it does not work in 3.14).
Rounding that raster and converting to a Int32, it already calculate quantiles correctly (image on the right):
Using Raster Pixels to Points algorithm to export Float64 pixel values to points, and calculating quantiles in the resulting vector layer, the output is the correct (as seen on the left image).
Running r.quantile in the Float64 raster, the output is correct:
0:20.000000:120.000000
1:40.000000:1320.000000
2:60.000000:2400.000000
3:80.000000:4500.000000
Also correct exporting to CSV and calculating in R:
> tabela_risco <- read.csv("D:\\RISCO_RASTER_PIXELS_TO_POINTS.csv")
> quantile(tabela_risco$VALUE, probs = c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0))
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
36.0 72.0 120.0 792.0 1320.0 1920.0 2400.0 3000.0 4500.0 6000.0 218030.4
So, it seems that quantile classification is still failing with some kind of data.
@elpaso @gioman
Can you please test quantile in this raster with 3.14?
@PedroVenancio @elpaso confirmed, new ticket filed here: https://github.com/qgis/QGIS/issues/37448
Another issue I'm seeing now, with a Float64 raster:
@PedroVenancio and https://github.com/qgis/QGIS/issues/37449
Hi @elpaso and @gioman
It keeps failing with some rasters. See this one:
Here is the project with the sample raster:
Quantiles calculated from Vector Symbology are always correct, just like r.quantile output:
0:20.000000:1.500000
1:40.000000:2.625000
2:60.000000:5.250000
3:80.000000:13.500000
Maybe related to https://github.com/qgis/QGIS/issues/39058
@PedroVenancio what QGIS version are you testing?
@PedroVenancio what QGIS version are you testing?
@elpaso I'm testing with
QGIS version | 3.15.0-Master | QGIS code revision | 8d2a0d1ebb |
---|---|---|---|
Compiled against Qt | 5.11.2 | Running against Qt | 5.11.2 |
Compiled against GDAL/OGR | 3.2.0dev | Running against GDAL/OGR | 3.2.0dev |
Compiled against GEOS | 3.8.1-CAPI-1.13.3 | Running against GEOS | 3.8.1-CAPI-1.13.3 |
Compiled against SQLite | 3.29.0 | Running against SQLite | 3.29.0 |
PostgreSQL Client Version | 11.5 | SpatiaLite Version | 4.3.0 |
QWT Version | 6.1.3 | QScintilla2 Version | 2.10.8 |
Compiled against PROJ | 7.2.0 | Running against PROJ | Rel. 7.2.0, November 1st, 2020 |
OS Version | Windows 10 (10.0) | This copy of QGIS writes debugging output. |
and
QGIS version | 3.14.16-Pi | QGIS code revision | 2b2f966720 |
---|---|---|---|
Compiled against Qt | 5.11.2 | Running against Qt | 5.11.2 |
Compiled against GDAL/OGR | 3.2.0dev | Running against GDAL/OGR | 3.2.0dev |
Compiled against GEOS | 3.8.1-CAPI-1.13.3 | Running against GEOS | 3.8.1-CAPI-1.13.3 |
Compiled against SQLite | 3.29.0 | Running against SQLite | 3.29.0 |
PostgreSQL Client Version | 11.5 | SpatiaLite Version | 4.3.0 |
QWT Version | 6.1.3 | QScintilla2 Version | 2.10.8 |
Compiled against PROJ | 7.2.0 | Running against PROJ | Rel. 7.2.0, November 1st, 2020 |
OS Version | Windows 10 (10.0) | This copy of QGIS writes debugging output. |
and I'm getting the same results.
And I also believe that https://github.com/qgis/QGIS/issues/39058 is somehow related.
This always the same issue: quantile for rasters is calculated on an histogram, which uses a sample of values and not the whole raster and a limited number of bins.
Unless we re-implement it completely using an exact approach similar to the one used by vector layers and GRASS (and like I've done with unique/paletted) you will always get approximated results, that might lead to a wrong classification in some cases.
Differently from paletted/quantile, where the calculations were already implemented in a separate thread, for singleband/pseudocolr that is not the case, quantile requires a non-trivial implementation of a new class to gather the values and build the histogram.
I'll see what I can do but I cannot make any promise.
I'll see what I can do but I cannot make any promise.
Great @elpaso , thank you very much for your effort!
@PedroVenancio can we close this?
Hi @elpaso
Do you think this needs a complete overhaul of Singleband pseudocolor implementation, right? Yes, I think we can close it, at least until we can think on a way to support that re-implementation.
Do you agree @gioman ?
@PedroVenancio can we close this?
@elpaso @PedroVenancio I re-read this ticket and other related. To me there is still an issue (the one described at the end of this ticket), so I don't see why close it. I agree that raster symbology should be overhauled, but until then this do not seems fully solved. Just my opinion @elpaso feel free to close.
@PedroVenancio @gioman the way I see it is that in order to get exact results from a raster classification we cannot use samples but read the whole raster.
Due to the size that a typical raster can get we cannot do this in the GUI thread without freezing QGIS, this means to take a different approach and probably split the classification methods to add a list of "exact" (and potentially very slow) methods.
I actually have got an idea and I've started an implementation back in october that exposes a raster layer as a vector layer and allows to use all vector classification methods already implemented for vectors in QGIS for free.
But...
so, the bottom line is that this is quite a big refactoring that would probably classify as a feature request.
I would be more than happy to work on this topic because I've already spent quite a lot of time researching and experimenting on this but I cannot do it without funds and it does not seem fair to me to use all the bugfixing funds for this development.
There is another potentially promising approach to speed up these calculations which is to use the GPU with OpenCL or OpenGL compute shaders but that is also quite a refactoring.
IMHO the route to go is a QEP followed by a grant, a crowdfunding, a generous donor or a combination of these.
IMHO the route to go is a QEP followed by a grant, a crowdfunding, a generous donor or a combination of these.
@elpaso I will be more than happy to chip in. And probably there are others interested. Ping me privately so we can get an idea of the amount that is necessary.
I'm seeing some inconsistencies in classes breaks, when using Quantile mode on a raster with singleband pseudocolor symbology.
To reproduce:
Download this sample raster and load it on QGIS: https://cld.pt/dl/download/7161caab-5ebe-4e94-8c4a-574a80821a97/perigosidade.tif
Properties -> Symbology -> Render type: Singleband pseudocolor -> Interpolation: Discrete -> Mode: Quantile -> Classes: 5. You will get:
Run GRASS r.quantile from Processing on this raster, with Number of Quantiles: 5. Leave all other options as default. You will get:
These are the correct classes breaks, and so the symbology should be:
I have done some tests to be sure about the correct classification.
[A] I've run Raster Pixels to Points to get the center point of all pixels. If you want to check, here is the output geopackage: https://cld.pt/dl/download/3764537c-2504-4326-be77-74e26c0d2a49/perigosidade.gpkg
Load it in QGIS and go to Properties -> Symbology -> Graduated -> Mode: Equal Count (Quantile) -> Classes: 5 -> Classify. You will get:
[B] I've run GRASS r.stats to get all pixel values. Here is the CSV with all values: https://cld.pt/dl/download/b730c614-40b3-40b8-887f-2b985e753d96/perigosidade_r_stats.csv
I've tested this csv in R:
[C] Then I extracted the absolute frequencies of raster values:
https://cld.pt/dl/download/af265a0a-024f-40c4-a04c-77eaa9f09173/perigosidade_r_stats_abs_freq.csv
and did the manual analysis of it, getting the cumulative relative frequencies and the result is the same: https://cld.pt/dl/download/9b014dbd-f991-42e9-880e-baec555a6d18/perigosidade_r_stats_abs_freq_cum.ods
So, it seems that Singleband Pseudocolor Quantile is not returning the correct breaks.