mthh / contour-rs

Contour polygon creation in Rust (using marching squares algorithm)
https://crates.io/crates/contour
Apache License 2.0
52 stars 10 forks source link

Overflow/Artifacts with large rasters (>40k*40k) #12

Closed netthier closed 5 months ago

netthier commented 5 months ago

If I pass in a raster larger than about 40000*40000 (not sure about the exact value, but 50k*50k fails) to ContourBuilder::contours, I get an error similar to this:

thread 'main' panicked at /home/nett/.cargo/registry/src/index.crates.io-6f17d22bba15001f/contour-0.12.1/src/isoringbuilder.rs:137:23:
index out of bounds: the len is 4294836225 but the index is 18446744071562067968
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Increasing the width of the involved local variables to i64 seems to have fixed this issue, and I'll create a PR containing those changes once I test it against real data.

netthier commented 5 months ago

Hm, when generating contours for large rasters affected by this issue using my fork I get artifacts that do not appear when only generating contours for a small region. I ran the code in debug mode as well, and no overflows were detected. The 'streaks' seem to both appear over areas which are supposed to be empty as well as solid areas. My changes so far are available here: https://github.com/SenseLabsDE/contour-rs/tree/fix-overflow, no idea what could be happening here :thinking: E.g. image vs. image

More images of artifacts
mthh commented 5 months ago

Thank you for your feedback and for starting to try to solve the problem.

It looks particularly bad, it would be great to find a solution.

Do you have the possibility of sharing the dataset with which the problem occurs?

I also have a library that generate isobands : https://github.com/mthh/contour-isobands-rs (it also uses marching-squares, but a slightly different implementation). I'd be surprised if it's any faster... but perhaps it could produce a more satisfactory result for your use case?

My changes so far are available here: https://github.com/SenseLabsDE/contour-rs/tree/fix-overflow

Thanks, I've had a look at your changes, in any case I think you're right about the u32 -> usize change. I will soon apply this change to the lib and investigate the cause of the bug.

netthier commented 5 months ago

I sadly cannot share the original dataset, but I can try creating a reproducer manually. Thanks for the recommendation of contour-isobands. I didn't look too closely at it before, but it seems like isobands are what I actually need :smile: However, I cannot try it out directly, as my dataset does not fit into RAM when using f64. I'll fork the crate and see what it generates with f32s.

UPDATE: Even when search+replacing all f64s in contour-isobands with f32s, I get memory usage at least twice that of my dataset, meaning my program runs out of memory :( I either didn't update all places I needed to, or the difference in algorithm also means higher memory usage.

mthh commented 5 months ago

I sadly cannot share the original dataset, but I can try creating a reproducer manually.

No problem. I'll try to generate one manually too if you can't.

UPDATE: Even when search+replacing all f64s in contour-isobands with f32s, I get memory usage at least twice that of my dataset, meaning my program runs out of memory :( I either didn't update all places I needed to, or the difference in algorithm also means higher memory usage.

Yes sadly the algorithm implemented in https://github.com/mthh/contour-isobands-rs is more memory-intensive (it has to manage more cases than the one in https://github.com/mthh/contour-rs, cf. https://en.wikipedia.org/wiki/Marching_squares#Isobands - maybe there's a way of making it lighter in terms of memory use, but I'm afraid that's beyond my skills and/or the time I have to devote to it)

netthier commented 5 months ago

I'm actually able to send an old version of the dataset after all, l'll send it to you via E-Mail once I prepare it. UPDATE: E-Mail sent.

netthier commented 5 months ago

Running without the f32 feature enabled fixes the artifacts. I wonder if this is a precision issue?

mthh commented 5 months ago

I strongly suspect that this is a numeric type conversion issue (because when I'm using only an extract of your dataset, but with the f32 feature, the result is correct) but I can't find out where in the code (in fact I've identified several places that could be problematic but haven't really found the culprit or how to fix it).

In rust, conversions with "as" are safe (in the sense that they don't panic) but they may not be correct.

For example here https://github.com/SenseLabsDE/contour-rs/blob/17053d29cad089500eadc2e266dec3988348eba9/src/isoringbuilder.rs#L106, usize is not guaranteed to fit in i64.

But there is also other places, for example here https://github.com/mthh/contour-rs/blob/master/src/isoringbuilder.rs#L169 we have a u32 (or usize in your version) that is cast to Float (so either f32 / f64 depending on the selected feature). Also here https://github.com/mthh/contour-rs/blob/master/src/isoringbuilder.rs#L181.

I'll keep investigating, if I can't fix it it will probably be to return an error message (clearer than the one you reported in your initial issue) when using grids that are too large, depending on the numeric type used (f32 / f64) and thus prevent the artifacts obtained with your modified version from appearing...

netthier commented 5 months ago

https://github.com/mthh/contour-rs/blob/08c1d83a1461aa19543ed402d23908eb55234fd1/src/isoringbuilder.rs#L180-L187 looks like a good candidate tbh, though I'd be surprised if the values passed there are large enough to cause issues. I'll see if I can insert some checks there.

UPDATE: I added assert_eq!((x, y), ((x as Float) as i64, (y as Float) as i64)); to the top of that, which I'd assume should catch any lossy conversions, but it didn't get hit.

UPDATE 2: Adding a assert!(idx < 16777217); to https://github.com/mthh/contour-rs/blob/master/src/isoringbuilder.rs#L168-L170 doesn't help either. I'd assume that it should catch any conversions that may have been off by one integer.

mthh commented 5 months ago

though I'd be surprised if the values passed there are large enough to cause issues

Indeed, that's what I think of it too, reading the code (and knowing the max size of f32) and that's why I still can't figure out what's going on :sweat:

mthh commented 5 months ago

I think I found how to fix it, it now works without artifact on the dataset you sent me, using the f32 feature.

Can you try by replacing https://github.com/mthh/contour-rs/blob/master/src/isoringbuilder.rs#L169 by (point.x as usize) * 2 + (point.y as usize) * (self.dx + 1) * 4 on your side ?

netthier commented 5 months ago

I can confirm that this fixes the artifacting in my larger, original dataset as well, thank you! I've added it to my branch and will open a PR

mthh commented 5 months ago

Great! :+1:

netthier commented 5 months ago

While the artifacts I showed earlier don't come up anymore, using f32 instead of f64 on the same dataset generates substantially more polygons in the resulting MultiPolygon, at least some seemingly filling holes in existing polygons. However, I couldn't reproduce this in the test dataset. I'll experiment further and send you another reproducer once I find one. Might be next week if I don't find one today.

mthh commented 5 months ago

Alright. Thank you for reporting these issues and helping to fix them!

netthier commented 5 months ago

I was able to reproduce this using the previously shared dataset after all. I ran these two executables:

use std::fs;
use gdal::Dataset;
use contour::ContourBuilder;
use geojson::{GeoJson, Geometry};

fn main() {
    let dataset = Dataset::open("dc.tiff").unwrap();
    let raster = dataset.rasterband(1).unwrap();
    let size = raster.size();
    let mut values: Vec<f32> = raster.read_as::<f32>((0, 0), size, size, None).unwrap().data;
    let values: Vec<f64> = values.drain(..).map(|v| v as f64).collect();
    let contours = ContourBuilder::new(size.0, size.1, false).contours(&values, &[1000.0]).unwrap();
    fs::write("./out-f64.json", GeoJson::Geometry(Geometry::new(geojson::Value::from(&contours[0].geometry().clone()))).to_string()).unwrap();
    println!("Subpolygons: {}", contours[0].geometry().0.len());
}
use std::fs;
use gdal::Dataset;
use contour::ContourBuilder;
use geojson::{GeoJson, Geometry};

fn main() {
    let dataset = Dataset::open("dc.tiff").unwrap();
    let raster = dataset.rasterband(1).unwrap();
    let size = raster.size();
    let values: Vec<f32> = raster.read_as::<f32>((0, 0), size, size, None).unwrap().data;
    let contours = ContourBuilder::new(size.0, size.1, false).contours(&values, &[1000.0]).unwrap();
    fs::write("./out-f32.json", GeoJson::Geometry(Geometry::new(geojson::Value::from(&contours[0].geometry().clone()))).to_string()).unwrap();
    println!("Subpolygons: {}", contours[0].geometry().0.len());
}

using these deps:

[dependencies]
contour = { git = "https://github.com/SenseLabsDE/contour-rs.git", branch = "fix-overflow", features = [] }
gdal = "0.16"
geojson = "0.24"

(and with f32 enabled for the second executable).

The println!("Subpolygons: {}") returns different results. The f32 case returns 209, the f64 one 498 (seems I got it the wrong way around in my previous report). I did a quick comparison, and it seemed as if some holes in one of them were a separate polygon in the other.

mthh commented 5 months ago

Interesting. In addition, I was unable to find these differences visually despite a lengthy inspection in QGIS :sweat:

I tested (still on the dataset you sent me) your two snippets using the lines method of the ContourBuilder (e.g. let lines = ContourBuilder::new(size.0, size.1, false).contours(&values, &[1000.0]).unwrap();) to generate only isolines.

The good news is that I'm getting exactly the same results in f32 and f64. This allows us to narrow down the search of the problem to the code specific to the contours method in contourbuilder.rs. I'm going to keep investigating over the weekend.

mthh commented 5 months ago

I think I've figured out how to fix it (and get exactly the same result for contours / isobands with the "f32" feature and the default feature with your example dataset), by always calculating the area (in area.rs) in f64.

I tried to push on the branch used for your PR #13 but it seems I'm not allowed to.

If you want to apply the following diff to your branch and test:

diff --git a/src/area.rs b/src/area.rs
index a3cb8f4..774bffd 100644
--- a/src/area.rs
+++ b/src/area.rs
@@ -1,10 +1,15 @@
 use crate::{Float, Pt};

-pub fn area(ring: &[Pt]) -> Float {
+#[allow(clippy::unnecessary_cast)]
+// Note that we need to disable the clippy warning about unnecessary casts
+// because of the "f32" optional feature (and because we want to ensure we always
+// use "f64" in this function, both in the default feature and in the "f32" feature).
+pub fn area(ring: &[Pt]) -> f64 {
     let n = ring.len();
-    let mut area = ring[n - 1].y * ring[0].x - ring[n - 1].x * ring[0].y;
+    let mut area =
+        ring[n - 1].y as f64 * ring[0].x as f64 - ring[n - 1].x as f64 * ring[0].y as f64;
     for i in 1..n {
-        area += ring[i - 1].y * ring[i].x - ring[i - 1].x * ring[i].y;
+        area += ring[i - 1].y as f64 * ring[i].x as f64 - ring[i - 1].x as f64 * ring[i].y as f64;
     }
     // Note that in the shoelace formula you need to divide this result by 2 to get the actual area.
     // Here we skip this division because we only use this area formula to calculate the winding
netthier commented 5 months ago

I can confirm that I now get the same amount of subpolygons with either type on both of my datasets, but I hope that there aren't similar issues hidden anywhere else. I'm currently working on making the type inside the data container generic and independent from the coordinate type here, which allows me to use f64s without running out of memory.