osm2pgsql-dev / osm2pgsql

OpenStreetMap data to PostgreSQL converter
https://osm2pgsql.org
GNU General Public License v2.0
1.48k stars 473 forks source link

Handling of "Multi" geometries in flex output #1316

Closed joto closed 3 years ago

joto commented 3 years ago

The flex output is still experimental. This gives us the chance to re-think the non-multi vs. multi-geometry issues. I see two distinct issues here:

  1. Should multi-geometries be split into single geometries?
  2. How does the generation of geometries relate to the type of the target geometry column?

Lets look at (multi)polygons first.

ad 1)

ad 2) Closed ways and some multipolygon relations will result in Polygon geometries, other multipolygon relations in MultiPolygon geometries. If the target column has type "Geometry" we can store both Polygons and Multipolygons in it. If the target column has type "MultiPolygon" and we want to store a Polygon in it, we can either error out or "upgrade" all Polygons to MultiPolygons. If the target column has type "Polygon" and we want to store a "Multipolygon" we can either error out split it into several rows.

My current thinking here is:

ad 1) The default should be not to split. It is the "safe" choice, the one with the least surprises. Splitting is a performance tweak that has its drawbacks, so people should have to conciously choose it. This is also easier to document/explain. ad 2) Setting a column type is a concious choice by the user. So if they choose, "Geometry", "Polygon", or "MultiPolygon" type, this is what they should get. But splitting of MultiPolygons in rows (see 1) should not be done automatically if the column has type row, instead this should be the one option, people have to set.

See also:

Now to LineStrings: Here the situation is somewhat different. We are not splitting MultiLinestrings, we are splitting long lines. The flex output already doesn't do this any more unless the split_at parameter is used. This allows the user to set the maximum length of the segments and has the obvious default of "do not split". We could introduce a special value for split_at that says: Only split MultiLineStrings into their members, if we want that. But otherwise this looks okay to me.

For the column type issue (2) I think this should have the same logic as for (Multi)Polygons.

And now that I am thinking about all this: Maybe the behaviour for LineStrings and Polygons should be more similar? What if the splits of Multipolygons can do the equivalent of the LineString split and split a (Multi)Polygon along a grid? We don't have to introduce this now, but it would be the logical extension of what we have now.

@pnorman ? @lonvia ?

mboeringa commented 3 years ago

@joto

A lot of sense in what you write, and I agree with most of it.

Just one remark about the potential splitting of Polygons: it would be most logical to use ST_Subdivide for this. Do note that this is a far more expensive operation than simply splitting LineStrings. Also note that splitting polygons comes with a couple of caveats:

Probably all a bit out of scope here, but it may be good to keep this in mind anyway.

joto commented 3 years ago

This has been implemented and documented now. Closing here.