kylebarron / geo-index

A Rust crate and Python library for packed, static, zero-copy spatial indexes.
Apache License 2.0
76 stars 6 forks source link

`intersection_candidates_with_other_tree()` does not find all candidates #42

Open JosiahParry opened 6 months ago

JosiahParry commented 6 months ago

I was trying to give geo-index a spin with the use case of finding self-intersecting geometries (contiguity). At minimum, we expect each geometry to be an intersection candidate with itself. Using geo-index only 1 intersection candidate is found. When compared to rstar, all candidates are found.

Repro below. Please excuse any horrendous use of geoarrow 🙃

use geo::BoundingRect;
use geo::Polygon;
use geo_index::rtree::RTreeIndex;
use geoarrow::algorithm::{geo_index::RTree, native::Concatenate};
use geoarrow::array::{AsChunkedGeometryArray, PolygonArray};
use geoarrow::io::geojson::read_geojson;
use geoarrow::{trait_::GeometryArrayAccessor, GeometryArrayTrait};
use rstar::primitives::GeomWithData;
use rstar::{primitives::Rectangle, AABB};

// Find tree self-intersection canddiates using rstar
fn geo_contiguity(geom: Vec<Polygon>) {
    let to_insert = geom
        .iter()
        .enumerate()
        .map(|(i, gi)| {
            let rect = gi.bounding_rect().unwrap();
            let aabb =
                AABB::from_corners([rect.min().x, rect.min().y], [rect.max().x, rect.max().y]);

            GeomWithData::new(Rectangle::from_aabb(aabb), i)
        })
        .collect::<Vec<_>>();

    let tree = rstar::RTree::bulk_load(to_insert);
    let candidates = tree.intersection_candidates_with_other_tree(&tree);

    println!("rstar candidates:");
    for (left_idx, right_idx) in candidates {
        println!("{:?} {:?}", left_idx.data, right_idx.data);
    }
}

// Find tree self-intersection canddiates using geo-index
fn contiguity(geoms: &PolygonArray<i32>, node_size: usize) -> Vec<(usize, usize, f64)> {
    let tree = geoms.create_rtree_with_node_size(node_size);
    let mut _neighbors = Vec::new();

    let candidates = tree.intersection_candidates_with_other_tree(&tree);

    println!("geo-index candidates:");
    for (left_idx, right_idx) in candidates {
        println!("{:?} {:?}", left_idx, right_idx);
    }

    _neighbors
}

fn main() {
    let file = std::fs::File::open("src/guerry.geojson").unwrap();
    let reader = std::io::BufReader::new(file);

    // read from file
    let geoms = read_geojson(reader, None).unwrap();
    let polys = geoms
        .geometry()
        .unwrap()
        .as_ref()
        .as_polygon()
        .concatenate()
        .unwrap();

    println!("There are {} polygons", polys.len());
    contiguity(&polys, 10);

    let geo_polys = polys.iter_geo_values().collect::<Vec<_>>();
    geo_contiguity(geo_polys);
}
Output:
``` There are 116 polygons geo-index candidates: 520 520 rstar candidates: 39 39 39 85 39 50 39 84 85 39 85 85 85 50 85 84 50 39 50 85 50 50 50 84 84 39 84 85 84 50 84 84 39 38 39 86 39 87 85 38 85 86 85 87 84 86 84 87 50 41 39 102 39 57 50 27 50 57 38 39 38 85 86 39 86 85 86 84 87 39 87 85 87 84 7 7 7 38 38 7 38 38 86 86 86 87 87 86 87 87 7 9 38 9 38 101 38 102 9 7 9 38 101 38 9 9 9 101 101 9 101 101 101 102 41 50 14 14 14 16 14 41 40 40 40 41 16 14 16 16 16 41 41 14 41 40 41 16 41 41 14 113 14 27 16 27 41 27 41 57 16 18 16 17 16 111 14 112 14 99 16 112 16 99 16 15 102 39 27 50 57 39 57 50 102 38 102 101 113 14 27 14 27 16 27 41 57 41 102 102 102 57 113 113 113 27 27 113 27 27 27 57 57 102 57 27 57 57 102 56 113 20 27 20 27 56 57 56 113 112 113 26 113 45 27 26 20 113 20 27 56 102 56 27 56 57 20 20 20 56 56 20 56 56 20 26 18 16 17 16 111 16 18 18 17 17 111 111 111 99 112 14 112 16 99 14 99 16 15 16 112 113 99 111 112 112 112 99 99 112 99 99 15 15 112 45 112 46 99 46 26 113 26 27 45 113 26 20 45 112 46 112 46 99 26 26 26 45 45 26 45 45 45 46 46 45 46 46 7 88 38 10 38 88 9 42 9 88 101 42 101 10 102 10 20 10 56 10 20 13 56 13 20 83 26 83 26 2 26 19 45 19 111 110 111 54 111 59 99 54 112 59 99 59 46 94 46 59 112 51 45 51 46 51 10 38 88 7 88 38 42 9 42 101 10 101 88 9 10 102 10 20 10 56 13 20 13 56 83 20 83 26 2 26 19 26 19 45 37 37 37 42 37 10 42 37 42 42 42 10 10 37 10 42 10 10 88 88 37 108 37 11 37 58 10 53 10 58 10 13 37 29 37 5 108 37 11 37 105 105 105 106 106 105 106 106 106 108 106 11 108 106 108 108 108 11 11 106 11 108 11 11 106 103 106 104 108 107 108 29 108 5 106 3 108 3 108 4 11 3 103 106 104 106 103 103 104 104 53 10 58 37 58 10 13 10 52 52 52 53 53 52 53 53 53 58 53 13 58 53 58 58 58 13 13 53 13 58 13 13 52 47 52 29 52 5 53 5 58 5 52 83 52 2 53 83 13 83 52 91 52 93 52 0 29 37 5 37 107 108 29 108 5 108 47 52 29 52 5 52 5 53 5 58 107 107 107 29 47 47 47 29 47 5 29 107 29 47 29 29 29 5 5 47 5 29 5 5 47 3 47 4 29 3 29 4 5 83 47 91 47 0 3 106 3 108 3 11 4 108 3 47 3 29 4 47 4 29 3 3 3 4 4 3 4 4 83 52 83 53 83 13 2 52 83 5 83 83 83 2 2 83 2 2 2 19 19 2 19 19 83 93 2 93 2 76 19 76 91 52 93 52 91 47 93 83 93 2 76 2 76 19 91 91 91 93 93 91 93 93 93 76 76 93 76 76 91 0 93 49 93 0 0 52 0 47 49 93 0 91 0 93 28 28 28 49 49 28 49 49 49 0 0 49 0 0 19 115 19 55 19 51 93 22 76 22 76 115 76 55 93 21 76 21 93 48 49 22 28 92 28 63 49 92 28 90 49 48 110 111 54 111 54 99 59 111 59 112 59 99 94 46 59 46 69 69 70 70 74 74 73 73 73 54 73 71 73 25 73 36 73 43 73 72 73 68 54 73 71 73 110 110 110 54 109 109 54 110 54 54 71 71 54 59 54 43 54 64 59 54 94 94 94 59 59 94 59 59 59 43 94 81 94 64 59 64 25 73 36 73 25 25 25 36 36 25 36 36 33 33 34 34 25 43 25 24 25 44 25 23 43 73 72 73 68 73 43 54 43 59 43 25 61 61 61 43 43 61 43 43 72 72 68 68 61 81 61 64 43 64 61 60 43 60 43 44 61 12 64 54 81 94 64 94 64 59 81 61 64 61 64 43 81 81 81 64 64 81 64 64 81 30 81 12 24 25 24 24 32 32 35 35 44 25 23 25 60 61 60 43 44 43 60 60 44 44 23 23 12 61 30 81 12 81 30 30 30 96 30 12 96 30 96 96 96 12 12 30 12 96 12 12 94 51 94 31 81 31 30 98 30 31 30 80 96 80 96 82 96 100 51 112 51 45 51 46 115 19 55 19 51 19 22 93 22 76 115 76 55 76 22 49 21 93 21 76 92 28 92 49 63 28 48 93 90 28 48 49 51 94 31 94 31 81 98 30 31 30 80 30 80 96 82 96 100 96 22 22 22 115 115 22 115 115 115 55 55 115 55 55 55 51 51 55 51 51 22 92 22 114 22 63 22 21 115 97 55 97 55 98 55 31 51 31 22 8 115 8 92 22 114 22 63 22 21 22 92 92 92 114 92 63 114 92 114 114 114 63 63 92 63 114 63 63 21 21 114 90 114 66 114 67 63 66 63 67 63 62 63 8 114 89 63 75 90 114 90 90 48 48 90 66 90 89 97 115 97 55 98 55 31 55 31 51 97 97 97 95 97 98 95 97 95 95 95 98 98 97 98 95 98 98 98 31 31 98 31 31 97 62 97 8 97 80 98 80 97 1 8 22 8 115 66 114 66 63 67 114 67 63 62 63 8 63 66 90 62 97 8 97 66 66 66 67 67 66 67 67 67 62 67 8 62 67 62 62 62 8 8 67 8 62 8 8 66 89 66 75 62 1 66 65 67 65 67 6 62 6 89 114 75 63 89 90 89 66 75 66 89 89 89 75 75 89 75 75 80 97 80 98 80 80 80 100 82 82 82 100 100 80 100 82 100 100 80 79 80 1 82 1 82 77 100 1 100 77 82 78 100 78 1 97 1 62 79 80 1 80 1 82 1 100 77 82 77 100 79 79 79 1 1 79 1 1 1 77 77 1 77 77 1 6 77 6 77 78 65 66 65 67 6 67 6 62 78 82 78 100 6 1 6 77 78 77 65 65 6 6 78 78 ```
JosiahParry commented 6 months ago

I forgot to add the geojson! guerry.geojson.zip

kylebarron commented 6 months ago

Thanks! I'll take a look when I have time. The search method is directly ported from upstream, and so that should be reliable (it has matched shapely/GEOS). The intersection candidates is my own addition.

Contributions welcome 😉 . I can describe/document the tree structure better if you're interested! (I actually have a half-written blog post on flatbush that I never finished)

JosiahParry commented 6 months ago

I can confirm that the search() method works well! I'm able to identify polygon queen contiguity with a function like this.

pub fn search_tree(left: OwnedRTree<f64>, targets: PolygonArray<i32>) -> Vec<(usize, usize, f64)> {
    let mut neighbors = Vec::new();
    let _ = targets.iter().enumerate().for_each(|(i, target)| {
        let geom = target.unwrap();
        let (min, max) = bounding_rect_polygon(&geom);
        let mut candidates = left.search(min[0], min[1], max[0], max[1]);
        candidates.sort();
        for c in candidates {
            if geom.to_geo().intersects(&targets.get(c).unwrap().to_geo()) {
                neighbors.push((i, c, 1.0));
            }
        }
    });
    neighbors
}

I tried toying around with the tree traversal but im a bit out of my depth when it comes to recursion and tree searching. I'll try and find some more time to maybe pluck around in the source code at a later point

pub fn two_tree_candidates(left: OwnedRTree<f64>, right: OwnedRTree<f64>) -> Vec<Vec<usize>> {
    let res = right
        .as_ref()
        .root()
        .children()
        .flat_map(|node| recursive_search(&node, &left))
        .collect::<Vec<_>>();
    res
}

fn recursive_search<'a>(
    node: &'a Node<'a, f64, RTreeRef<'a, f64>>,
    left: &OwnedRTree<f64>,
) -> Vec<Vec<usize>> {
    if node.is_leaf() {
        let candidate = left.search(node.min_x(), node.min_y(), node.max_x(), node.max_y());
        vec![candidate]
    } else if node.is_parent() {
        node.children()
            .flat_map(|child| recursive_search(&child, left))
            .collect()
    } else {
        Vec::new()
    }
}
kylebarron commented 6 months ago

I can confirm that the search() method works well!

Indeed, there are never any bugs in @ mourner's code 😉

I think the gist of my tree traversal is sound. It's probably just missing a check somewhere.

kylebarron commented 6 months ago

I put together a small derived repro in https://github.com/kylebarron/geo-index/pull/51. Run with cargo run -- --debug. It's simplest to remove geoarrow from the picture here.

One of the issues is that https://github.com/kylebarron/geo-index/blob/159f2dbbe19538fb7bc72d74b610074d5a6b3c47/src/rtree/traversal.rs#L56 is backwards and should be < instead. So that meant that the iterator was never recursing: https://github.com/kylebarron/geo-index/blob/159f2dbbe19538fb7bc72d74b610074d5a6b3c47/src/rtree/traversal.rs#L190

Fixing that gives endless recursion, so I still have another issue somewhere. It's probably that I'm mixing parent and child intersections somewhere