orbisgis / geoclimate

Geospatial processing toolbox for environmental and climate studies
GNU Lesser General Public License v3.0
59 stars 15 forks source link

Endless SQL query in OSM for bbox [50, 8.6, 50.2, 8.8] #876

Closed j3r3m1 closed 9 months ago

j3r3m1 commented 9 months ago

GeoClimate does not seem to be able to run the following SQL query: https://github.com/orbisgis/geoclimate/blob/185f6b419a8a39e5a655d85b6dd4dc067539d917/osmtools/src/main/groovy/org/orbisgis/geoclimate/osmtools/Transform.groovy#L352

It never stops working on it. The SQL query is:

CREATE TABLE WAYS_POLYGONS_1f5af1f0_31a3_46e0_bb14_8c53a06f3bcc AS 
                    SELECT 'w'||a.id_way AS id, case when st_isvalid(a.the_geom) then a.the_geom else  st_makevalid(a.the_geom) end as the_geom , MAX(CASE WHEN b.tag_key = 'aeroway' THEN b.tag_value END) AS "aeroway",MAX(CASE WHEN b.tag_key = 'agricultural' THEN b.tag_value END) AS "agricultural",MAX(CASE WHEN b.tag_key = 'amenity' THEN b.tag_value END) AS "amenity",MAX(CASE WHEN b.tag_key = 'barrier' THEN b.tag_value END) AS "barrier",MAX(CASE WHEN b.tag_key = 'building' THEN b.tag_value END) AS "building",MAX(CASE WHEN b.tag_key = 'building:levels' THEN b.tag_value END) AS "building:levels",MAX(CASE WHEN b.tag_key = 'government' THEN b.tag_value END) AS "government",MAX(CASE WHEN b.tag_key = 'healthcare' THEN b.tag_value END) AS "healthcare",MAX(CASE WHEN b.tag_key = 'height' THEN b.tag_value END) AS "height",MAX(CASE WHEN b.tag_key = 'historic' THEN b.tag_value END) AS "historic",MAX(CASE WHEN b.tag_key = 'house' THEN b.tag_value END) AS "house",MAX(CASE WHEN b.tag_key = 'industrial' THEN b.tag_value END) AS "industrial",MAX(CASE WHEN b.tag_key = 'layer' THEN b.tag_value END) AS "layer",MAX(CASE WHEN b.tag_key = 'leisure' THEN b.tag_value END) AS "leisure",MAX(CASE WHEN b.tag_key = 'man_made' THEN b.tag_value END) AS "man_made",MAX(CASE WHEN b.tag_key = 'military' THEN b.tag_value END) AS "military",MAX(CASE WHEN b.tag_key = 'office' THEN b.tag_value END) AS "office",MAX(CASE WHEN b.tag_key = 'public_transport' THEN b.tag_value END) AS "public_transport",MAX(CASE WHEN b.tag_key = 'railway' THEN b.tag_value END) AS "railway",MAX(CASE WHEN b.tag_key = 'residential' THEN b.tag_value END) AS "residential",MAX(CASE WHEN b.tag_key = 'restaurant' THEN b.tag_value END) AS "restaurant",MAX(CASE WHEN b.tag_key = 'roof:height' THEN b.tag_value END) AS "roof:height",MAX(CASE WHEN b.tag_key = 'roof:levels' THEN b.tag_value END) AS "roof:levels",MAX(CASE WHEN b.tag_key = 'roof:shape' THEN b.tag_value END) AS "roof:shape",MAX(CASE WHEN b.tag_key = 'ruins' THEN b.tag_value END) AS "ruins",MAX(CASE WHEN b.tag_key = 'shop' THEN b.tag_value END) AS "shop",MAX(CASE WHEN b.tag_key = 'tourism' THEN b.tag_value END) AS "tourism",CAST(NULL AS VARCHAR) AS "monument",CAST(NULL AS VARCHAR) AS "place_of_worship",CAST(NULL AS VARCHAR) AS "historic:building",CAST(NULL AS VARCHAR) AS "grandstand",CAST(NULL AS VARCHAR) AS "apartments",CAST(NULL AS VARCHAR) AS "barn",CAST(NULL AS VARCHAR) AS "education",CAST(NULL AS VARCHAR) AS "sustenance" 
FROM WAYS_POLYGONS_TMP_e6259cff_169a_477c_ab18_aaef73181586 AS a, OSM_DATA_c49ee86d_f9ea_4f50_be00_d3d96745e000_way_tag b 
WHERE a.id_way=b.id_way and st_isempty(a.the_geom)=false  AND b.TAG_KEY IN ('height','roof:height','building:levels','roof:levels','building','amenity','layer','aeroway','historic','leisure','monument','place_of_worship','military','railway','public_transport','barrier','government','historic:building','grandstand','house','shop','industrial','man_made','residential','apartments','ruins','agricultural','barn','healthcare','education','restaurant','sustenance','office','tourism','roof:shape')  and a.the_geom && st_setsrid('POLYGON ((471333.21004455804 5538649.866669843, 471333.21004455804 5560944.562704275, 485726.139629124 5560944.562704275, 485726.139629124 5538649.866669843, 471333.21004455804 5538649.866669843))'::geometry, 32632) and st_intersects(a.the_geom,st_setsrid('POLYGON ((471333.21004455804 5538649.866669843, 471333.21004455804 5560944.562704275, 485726.139629124 5560944.562704275, 485726.139629124 5538649.866669843, 471333.21004455804 5538649.866669843))'::geometry, 32632)) 
GROUP BY  a.id_way

I have investigated the data, it looks OK, I have not found obvious reason for this anormally long calculation. @ebocher any idea what to investigate ? If not I can try to reduce the query size (keeping only some of the tags) to identify where does the problem come from.

ebocher commented 9 months ago

Tested on my laptop (CORE I7), I'm not able to reproduce this issue. The data is formated in less than 4 minutes. Perhaps a ghost process that has remained active on the H2GIS db. This happens when an error has been encountered in a previous process. Can you run a test with a fresh instance of GeoClimate and a new H2GIS db?

geoms

j3r3m1 commented 9 months ago

The H2GIS db is not new as long as I start a new command line with default db informations ? Anyway I have rerun the following config file and get to the same problem: the algo is still on the same SQL query 20 minutes after:

15:04:44.816 [main] INFO class org.orbisgis.geoclimate.osmtools.OSMTools -- The input OSM file has been loaded in the database. 15:04:44.816 [main] DEBUG class org.orbisgis.geoclimate.osm.OSM -- Create the building layer 15:04:44.827 [main] DEBUG class org.orbisgis.geoclimate.osmtools.OSMTools -- Start polygons transformation 15:04:44.827 [main] DEBUG class org.orbisgis.geoclimate.osmtools.OSMTools -- Indexing osm tables... 15:05:03.989 [main] DEBUG class org.orbisgis.geoclimate.osmtools.OSMTools -- Build way polygons

Here is the config file


{
    "description": "Processing OSM data",
    "input": {
    "locations": ["Nice"]
    },
    "output": {
        "folder": "/home/bernardj/Software/GeoClimate/"
    },
    "parameters": {
        "rsu_indicators": {
            "indicatorUse": [
                "LCZ",
                "TEB",
                "UTRF"
            ],
            "svfSimplified": true,
            "estimateHeight": true
        },
        "grid_indicators": {
            "x_size": 100,
        "y_size": 100,
        "rowCol": false,
        "output" : "geojson",
        "indicators" :[
                 "BUILDING_FRACTION", 
                 "BUILDING_HEIGHT", 
                 "WATER_FRACTION",
                 "VEGETATION_FRACTION", 
                 "ROAD_FRACTION", 
                 "IMPERVIOUS_FRACTION", 
                 "LCZ_FRACTION"
             ]
         }
    }
}
ebocher commented 9 months ago

It seeems that there is an issue with the execution plan.

The endless SQL query is :

SELECT 'w'||a.id_way AS id, a.the_geom , MAX(CASE WHEN b.tag_key = 'aeroway' THEN b.tag_value END) AS \"aeroway\",MAX(CASE WHEN b.tag_key = 'agricultural' THEN b.tag_value END) AS \"agricultural\",
                " FROM WAYS_POLYGONS AS a, OSM b WHERE a.id_way=b.id_way and st_isempty(a.the_geom)=false  AND b.TAG_KEY IN ('height','roof:height','building:levels','roof:levels','building','amenity','layer','aeroway','historic','leisure','monument','wall')  
                "and a.the_geom && st_setsrid('POLYGON ((198.5 186.5, 269.5 186.5, 269.5 115, 198.5 115, 198.5 186.5))'::geometry, 32632)  
                and st_intersects(a.the_geom,st_setsrid('POLYGON ((198.5 186.5, 269.5 186.5, 269.5 115, 198.5 115, 198.5 186.5))'::geometry, 32632))  GROUP BY  a.id_way;

the explain plan returns

SELECT
    'w' || "A"."ID_WAY" AS "ID",
    CASE WHEN "PUBLIC"."ST_ISVALID"("A"."THE_GEOM") THEN "A"."THE_GEOM" ELSE "PUBLIC"."ST_MAKEVALID"("A"."THE_GEOM") END AS "THE_GEOM",
    MAX(CASE WHEN "B"."TAG_KEY" = 'aeroway' THEN "B"."TAG_VALUE" END) AS "aeroway",
    MAX(CASE WHEN "B"."TAG_KEY" = 'agricultural' THEN "B"."TAG_VALUE" END) AS "agricultural"
FROM "PUBLIC"."OSM" "B"
    /* PUBLIC.INDEX_132: TAG_KEY IN('height', 'roof:height', 'building:levels', 'roof:levels', 'building', 'amenity', 'layer', 'aeroway', 'historic', 'leisure', 'monument', 'wall') */
    /* WHERE B.TAG_KEY IN('height', 'roof:height', 'building:levels', 'roof:levels', 'building', 'amenity', 'layer', 'aeroway', 'historic', 'leisure', 'monument', 'wall')
    */
INNER JOIN "PUBLIC"."WAYS_POLYGONS" "A"
    /* PUBLIC.INDEX_4F: THE_GEOM && PUBLIC.ST_SETSRID(GEOMETRY 'POLYGON ((471333.21004455804 5538649.866669843, 471333.21004455804 5560944.562704275, 485726.139629124 5560944.562704275, 485726.139629124 5538649.866669843, 471333.21004455804 5538649.866669843))', 32632) */
    ON 1=1
WHERE ("B"."TAG_KEY" IN('height', 'roof:height', 'building:levels', 'roof:levels', 'building', 'amenity', 'layer', 'aeroway', 'historic', 'leisure', 'monument', 'wall'))
    AND ("A"."ID_WAY" = "B"."ID_WAY")
    AND ("A"."THE_GEOM" && "PUBLIC"."ST_SETSRID"(GEOMETRY 'POLYGON ((471333.21004455804 5538649.866669843, 471333.21004455804 5560944.562704275, 485726.139629124 5560944.562704275, 485726.139629124 5538649.866669843, 471333.21004455804 5538649.866669843))', 32632))
    AND ("PUBLIC"."ST_ISEMPTY"("A"."THE_GEOM") = FALSE)
    AND "PUBLIC"."ST_INTERSECTS"("A"."THE_GEOM", "PUBLIC"."ST_SETSRID"(GEOMETRY 'POLYGON ((471333.21004455804 5538649.866669843, 471333.21004455804 5560944.562704275, 485726.139629124 5560944.562704275, 485726.139629124 5538649.866669843, 471333.21004455804 5538649.866669843))', 32632))
GROUP BY "A"."ID_WAY"

If the geometry filter is removed the explain plan returns

SELECT
    'w' || "A"."ID_WAY" AS "ID",
    CASE WHEN "PUBLIC"."ST_ISVALID"("A"."THE_GEOM") THEN "A"."THE_GEOM" ELSE "PUBLIC"."ST_MAKEVALID"("A"."THE_GEOM") END AS "THE_GEOM",
    MAX(CASE WHEN "B"."TAG_KEY" = 'aeroway' THEN "B"."TAG_VALUE" END) AS "aeroway",
    MAX(CASE WHEN "B"."TAG_KEY" = 'agricultural' THEN "B"."TAG_VALUE" END) AS "agricultural"
FROM "PUBLIC"."OSM" "B"
    /* PUBLIC.INDEX_132: TAG_KEY IN('height', 'roof:height', 'building:levels', 'roof:levels', 'building', 'amenity', 'layer', 'aeroway', 'historic', 'leisure', 'monument', 'wall') */
    /* WHERE B.TAG_KEY IN('height', 'roof:height', 'building:levels', 'roof:levels', 'building', 'amenity', 'layer', 'aeroway', 'historic', 'leisure', 'monument', 'wall')
    */
INNER JOIN "PUBLIC"."WAYS_POLYGONS" "A"
    /* PUBLIC.INDEX_4: ID_WAY = B.ID_WAY */
    ON 1=1
WHERE ("B"."TAG_KEY" IN('height', 'roof:height', 'building:levels', 'roof:levels', 'building', 'amenity', 'layer', 'aeroway', 'historic', 'leisure', 'monument', 'wall'))
    AND ("A"."ID_WAY" = "B"."ID_WAY")
    AND ("PUBLIC"."ST_ISEMPTY"("A"."THE_GEOM") = FALSE)
GROUP BY "A"."ID_WAY"

This second query runs in few seconds.

@katzyn Could you please take a look and give us your point of view.

ebocher commented 9 months ago

The complete test

 Statement stat = connection.createStatement();
        //Prepare data

        stat.execute("DROP TABLE IF EXISTS WAYS_POLYGONS, OSM;" +
                "CREATE TABLE WAYS_POLYGONS AS SELECT 1 as ID_WAY, 'POLYGON ((160 280, 240 280, 240 140, 160 140, 160 280))'::GEOMETRY as the_geom;" +
                "CREATE TABLE OSM AS SELECT 1 as ID_WAY, 'aeroway' AS TAG_KEY, 'aeroway' AS TAG_VALUE; " +
                "CREATE INDEX ON WAYS_POLYGONS(ID_WAY);" +
                "CREATE SPATIAL INDEX ON WAYS_POLYGONS(THE_GEOM);" +
                "CREATE INDEX ON OSM(ID_WAY);" +
                "CREATE INDEX ON OSM(TAG_VALUE);" +
                "CREATE INDEX ON OSM(TAG_KEY)");

        ResultSet res = stat.executeQuery("EXPLAIN SELECT 'w'||a.id_way AS id, a.the_geom , MAX(CASE WHEN b.tag_key = 'aeroway' THEN b.tag_value END) AS \"aeroway\",MAX(CASE WHEN b.tag_key = 'agricultural' THEN b.tag_value END) AS \"agricultural\"," +
                " FROM WAYS_POLYGONS AS a, OSM b WHERE a.id_way=b.id_way and st_isempty(a.the_geom)=false  AND b.TAG_KEY IN ('height','roof:height','building:levels','roof:levels','building','amenity','layer','aeroway','historic','leisure','monument','wall')  " +
                "and a.the_geom && st_setsrid('POLYGON ((198.5 186.5, 269.5 186.5, 269.5 115, 198.5 115, 198.5 186.5))'::geometry, 32632) " +
                "and st_intersects(a.the_geom,st_setsrid('POLYGON ((198.5 186.5, 269.5 186.5, 269.5 115, 198.5 115, 198.5 186.5))'::geometry, 32632))  GROUP BY  a.id_way;" +
                "  ");
        res.next();
        System.out.println("\n With spatial query \n"+ res.getString(1));

        res = stat.executeQuery("EXPLAIN SELECT 'w'||a.id_way AS id, a.the_geom , MAX(CASE WHEN b.tag_key = 'aeroway' THEN b.tag_value END) AS \"aeroway\",MAX(CASE WHEN b.tag_key = 'agricultural' THEN b.tag_value END) AS \"agricultural\"," +

                " FROM WAYS_POLYGONS AS a, OSM b WHERE a.id_way=b.id_way and st_isempty(a.the_geom)=false  AND b.TAG_KEY IN ('height','roof:height','building:levels','roof:levels','building','amenity','layer','aeroway','historic','leisure','monument','wall')  " +
                  " GROUP BY  a.id_way;");
        res.next();
        System.out.println("\n Without spatial query \n"+ res.getString(1));
j3r3m1 commented 9 months ago

@katzyn any idea about ? If not, @ebocher should we try to segment the query into severals ?

katzyn commented 9 months ago

It looks like a wrong estimation of a query execution cost.

Index hints can help, but please post a complete standalone (Java / JDBC / SQL only) test case in a new issue here:

https://github.com/h2database/h2database/issues H2GIS may not be used in a test case too.

st_setsrid('POLYGON ((198.5 186.5, 269.5 186.5, 269.5 115, 198.5 115, 198.5 186.5))'::geometry, 32632) can be replaced with GEOMETRY 'SRID=32632;POLYGON ((198.5 186.5, 269.5 186.5, 269.5 115, 198.5 115, 198.5 186.5))'.

st_intersects can probably be replaced with && operator.

ebocher commented 9 months ago

Done @katzyn Many thanks in advance

ebocher commented 9 months ago

I will propose a temporary workaround.