osm-fr / osmose-backend

Part of osmose that runs the analysis, and send the results to the frontend.
GNU General Public License v3.0
92 stars 115 forks source link

Create common table for multipolygons #1818

Open Famlam opened 1 year ago

Famlam commented 1 year ago

As mentioned in https://github.com/osm-fr/osmose-backend/pull/1788#issuecomment-1510281766 it would be useful to have a table containing all multipolygons, similar to i.e. the tables 'highways' and 'buildings'. Parsing multipolygons isn't trivial:

Example of a complicated multipolygon for demonstration purpose only (not a real-life example)

The primary use case would be that every current analyser that checks things (most notably intersections) on polygons could also support multipolygons using a common table.

frodrigo commented 1 year ago

Note 1. We already have a validator. This should not be for detecting issue, but only for valid MP. Note 2. I think ST_MakePolygon done most of the work

Famlam commented 1 year ago

Regarding note 1: agree, I should've made this more clear Regarding note 2: correct for the simple cases, but unfortunately it doesn't seem to support multiple outers or composite (non-closed) linestrings forming a closed loop. (And I personally have no clue how to handle the latter)

frodrigo commented 1 year ago

Regarding note 2: correct for the simple cases, but unfortunately it doesn't seem to support multiple outers or composite (non-closed) linestrings forming a closed loop. (And I personally have no clue how to handle the latter)

Right. That a difference between OGC Polygon definition and OSM MP, it is an OGC MultiPolygon.

I guess we can group outer linestring by clustering with ST_ClusterIntersecting, or with recursive query.

But it is typical OSM issue. Maybe the problem is already solve else where. eg

frodrigo commented 1 year ago

I try an implementation.

The naive query is, but it work only with valid data in OSM

CREATE TEMP TABLE multipolygon AS
SELECT
    id,
    ST_BuildArea(linestrings) AS poly
FROM (
    SELECT
        relations.id,
        ST_LineMerge(ST_Collect(ways.linestring)) AS linestrings
    FROM
        relations
        JOIN relation_members ON
            relation_members.relation_id = relations.id AND
            relation_members.member_type = 'W' AND
            relation_members.member_role IN ('outer', 'inner')
        JOIN ways ON
            ways.id = relation_members.member_id
    WHERE
        relations.tags->'type' = 'multipolygon'
    GROUP BY
        relations.id
) AS t
WHERE
    ST_IsValid(linestrings)
;

I add validation before runnin the same idea.

CREATE TEMP TABLE multipolygon_parts AS
SELECT
    relations.id,
    relation_members.member_role AS role,
    (ST_Dump(ST_LineMerge(ST_Collect(ways.linestring)))).geom AS linestring
FROM
    relations
    JOIN relation_members ON
        relation_members.relation_id = relations.id AND
        relation_members.member_type = 'W' AND
        relation_members.member_role IN ('outer', 'inner')
    JOIN ways ON
        ways.id = relation_members.member_id
WHERE
    relations.tags->'type' = 'multipolygon'
GROUP BY
    relations.id,
    relation_members.member_role
;
CREATE INDEX multipolygon_parts_id ON multipolygon_parts(id);

DROP TABLE multipolygon_valid_parts;
CREATE TEMP TABLE multipolygon_valid_parts AS
SELECT
    mp1.id
FROM
    multipolygon_parts AS mp1
GROUP BY
    mp1.id
HAVING
    BOOL_AND(
        ST_NumPoints(mp1.linestring) > 3 AND
        ST_IsClosed(mp1.linestring) AND
        ST_IsValid(mp1.linestring) AND
        ST_IsSimple(mp1.linestring) AND
        ST_IsValid(ST_MakePolygon(mp1.linestring))
    )
;

DROP TABLE multipolygon_selfintersects;
CREATE TEMP TABLE multipolygon_selfintersects AS
SELECT
    mp1.id
FROM
    multipolygon_parts AS mp1
    JOIN multipolygon_parts AS mp2 ON
        mp1.id = mp2.id AND
        NOT mp1.linestring = mp2.linestring AND
        ST_Intersects(mp1.linestring, mp2.linestring)
;

CREATE TEMP TABLE multipolygons AS
SELECT
    multipolygon_parts.id,
    ST_BuildArea(ST_Collect(linestring)) AS poly
FROM
    multipolygon_parts
    JOIN multipolygon_valid_parts ON
        multipolygon_valid_parts.id = multipolygon_parts.id
    LEFT JOIN multipolygon_selfintersects ON
        multipolygon_selfintersects.id = multipolygon_parts.id
WHERE
    multipolygon_selfintersects.id IS NULL
GROUP BY
    multipolygon_parts.id
;

It is not so fast. But with that I am able to build 183 716 MP over 195 072 of France. Mote than 10 min.

I think as a side idea, analyse of MP could be improved with the cleaning process required here.

Famlam commented 1 year ago

It is not so fast. But with that I am able to build 183 716 MP over 195 072 of France. Mote than 10 min.

So there's over 11k invalid MPs in France (6% of all)? Or are some valid MPs still unsupported with this? 10 minutes for all of France doesn't seem so long to me though, given that France is normally split in many smaller regions in Osmose?

frodrigo commented 1 year ago

It is not so fast. But with that I am able to build 183 716 MP over 195 072 of France. Mote than 10 min.

So there's over 11k invalid MPs in France (6% of all)? Or are some valid MPs still unsupported with this? 10 minutes for all of France doesn't seem so long to me though, given that France is normally split in many smaller regions in Osmose?

No. It is not so simple. MP over boundaries will be missing too, as incomplete.

The MP analyser already report 3800 issues for France, without event checking at geometry. https://osmose.openstreetmap.fr/fr/issues/open?item=1170&country=france%2a

frodrigo commented 1 year ago

I made a more shorter and false safe version.

CREATE OR REPLACE FUNCTION build_multipolygons()
RETURNS VOID
AS $$
DECLARE
    mp RECORD;
BEGIN
    DROP TABLE IF EXISTS multipolygons;
    CREATE TEMP TABLE multipolygons (
        LIKE relations INCLUDING ALL,
        poly geometry(Geometry,4326) NOT NULL,
        is_valid boolean NOT NULL
    );

    FOR mp in (
        SELECT
            relations.*,
            ST_LineMerge(ST_Collect(ways.linestring)) AS linestrings
        FROM
            relations
            JOIN relation_members ON
                relation_members.relation_id = relations.id AND
                relation_members.member_type = 'W' AND
                relation_members.member_role IN ('outer', 'inner')
            JOIN ways ON
                ways.id = relation_members.member_id
        WHERE
            relations.tags->'type' = 'multipolygon'
        GROUP BY
            relations.id
    ) LOOP
        BEGIN
            IF ST_BuildArea(mp.linestrings) IS NOT NULL AND NOT ST_IsEmpty(ST_BuildArea(mp.linestrings)) THEN
                WITH
                unary AS (
                    SELECT
                        id, version, user_id, tstamp, changeset_id, tags,
                        (ST_Dump(poly)).geom AS poly
                    FROM (VALUES (
                        mp.id,
                        mp.version,
                        mp.user_id,
                        mp.tstamp,
                        mp.changeset_id,
                        mp.tags,
                        ST_BuildArea(mp.linestrings)
                    )) AS t(id, version, user_id, tstamp, changeset_id, tags, poly)
                ),
                simplified AS (
                    SELECT
                        id, version, user_id, tstamp, changeset_id, tags,
                        ST_BuildArea(ST_Collect(
                            ST_ExteriorRing(poly),
                            (SELECT ST_Union(ST_InteriorRingN(poly, n)) FROM generate_series(1, ST_NumInteriorRings(poly)) AS t(n))
                        )) AS poly
                    FROM
                        unary
                ),
                multi AS (
                    SELECT
                        id, version, user_id, tstamp, changeset_id, tags,
                        ST_CollectionHomogenize(ST_Collect(poly)) AS poly
                    FROM
                        simplified
                    GROUP BY
                        id, version, user_id, tstamp, changeset_id, tags
                )
                INSERT INTO
                    multipolygons
                SELECT
                    *,
                    ST_IsValid(poly) AS is_valid
                FROM multi;
            END IF;
        EXCEPTION WHEN OTHERS THEN
            RAISE NOTICE 'multipolygon fails: %', mp.id;
        END;
    END LOOP;
END;
$$ LANGUAGE 'plpgsql';
select build_multipolygons();
Famlam commented 1 year ago

After polygon_small, probably also the following other analysers can benefit

And maybe more. Just listing them now for later

frodrigo commented 1 year ago

After polygon_small, probably also the following other analysers can benefit And maybe more.

The common table buildings could also be improved.

Famlam commented 1 year ago

The common table buildings could also be improved.

Even common table highways, for the approximately 50k relations https://taginfo.openstreetmap.org/keys/highway?filter=relations#overview (Mostly pedestrian & footway)