osm2pgsql-dev / osm2pgsql

OpenStreetMap data to PostgreSQL converter
https://osm2pgsql.org
GNU General Public License v2.0
1.5k stars 474 forks source link

Ideas for improved geometry processing #1663

Open joto opened 2 years ago

joto commented 2 years ago

At the core of osm2pgsqls mission is the processing of geometries from OSM data into some useful format in a PostgreSQL/PostGIS database. Osm2pgsql is one step in a larger toolchain transforming the geometries in to pixels on your screen showing a map (or help finding places in the world of whatever the final use of the data is).

Conceptionally processing in osm2pgsql has these steps:

  1. The OSM objects (nodes, ways, and relations) are assembled into geometries.
  2. The geometries are optionally transformed in some way to make them more useful or easier or quicker to access.
  3. The geometries are loaded into the database.

You might think that osm2pgsql is not doing much in the second step, but there are several (optional) operations which fall into that category:

Geoprocessing in the database

All other geometry processing is currently delegated to the database. After all one of the major reasons we are using the PostgreSQL/PostGIS database system is its powerful geometry processing capabilities. Users of osm2pgsql use the database to calculate labelling points, simplify linestring and polygons, merge multiple smaller objects into larger ones and many more things.

But there are some costs involved with doing all those things in the database:

So it makes sense to add some more geometry processing capabilities to osm2pgsql. The PR #1636 is where we are testing some of these.

But adding these kinds of capabilities only gets you so far. The way osm2pgsql operates you can usually only operate on a single feature at a time. So we can calculate the centroid of a polygon or simplify the geometry of a single way. But whenever we want to operate on multiple features, we really need to go to the database. Again, that's why we have the database, because it can easily find a bunch of objects and do some geometry processing on them. So that kind of processing is not going to go away.

Working with updates

If we only do a one-off import of OSM data, we can easily run a SQL script afterwards that does any kind of processing we can imagine. Many people do that already. But if we want to be able to update the database this becomes more tricky. We need mechanisms to track what changes need to be done and trigger those changes. There are several options how such a tracking and triggering could be done:

So where do we go from here

I am sure there are more ideas.

mboeringa commented 2 years ago
  • It is possible, but not that easy to, say, join all linestrings of a longer street this way and keep track of all the constituent geometries and keep the result up-to-date this way. It would be great if we can make this kind of thing easier to do.

Hi @joto ,

Nice comprehensive overview of the problems!

To keep things manageable though, I would first focus the efforts on "per-object" geometry processing for now (node, way, relation), and leave the onus for handling random collections of objects not part of OpenStreetMap relations to the end user of the produced database.

E.g. I think the one major enhancement that would be welcomed by many, is enhanced processing of relations, like the ability to filter relation members based on their relation "role" and construct some geometry from the filtered members, merge relations members into larger structures etc.

Currently, relations are a kind of "second-rate" citizen in the realm of OpenStreetMap software. Except for multipolygons and stuff like boundaries, there isn't a whole lot of support in the ecosystem to handle them.

If osm2pgsql had targeted functions to do clever things with relations, like the filtering on role, OpenStreetMap relations could in fact become much more useful and "first-rate" citizens, as they deserve to be!

It is especially the handling of relations that is difficult if impossible to do at the database level (not even taking into account that you currently can't access the full relation structure after import using osm2pgsql or most other import tools, let alone in the context of a continuously updated database).

mmd-osm commented 2 years ago

Just for clarification: is the tool support for country defaults @lonvia mentioned in https://lists.openstreetmap.org/pipermail/talk/2022-April/087426.html somehow in scope for improved geometry processing?

lonvia commented 2 years ago

It's in scope but we are still discussing to what extent.

Tool support for country defaults basically means an easy mechanism to determine for each OSM object which country it is in. With the recently merged get_bbox() function for relations, it's already possible to implement something like that in the flex lua scripts. You'd have to do the lookup from bbox to country through external libraries and then use the country to compute your country defaults.

What we are discussing is if osm2pgsql should offer some native function to determine the 'region' directly to make all this easier. The lookup from region to attribute defaults will likely remain in the responsibility of the script writer because it is fairly easy and efficient to do in Lua.

zdila commented 2 years ago

I am missing simplify for (multi)polygons which is mentioned in issue description. Also for GeometryCollections (it would simplify all (multi)linestrings there). I think it deserves a separate issue. May I create it? Also In my case simplify was also failing for multilinestrings but it should be AFAIK supported.

tordans commented 2 years ago

Topic: Remove road stubs – I am processing OSM road data to create a road class network for evaluating bicycle infrastructre. For this, only those roads that are part of a network – as in "they connect to another road" – are interesting. This cleanup becomes more important, the more driveways and footways that lead to houses are mapped.

Eg: image https://www.openstreetmap.org/#map=18/52.38108/13.59252 shows a lot of those stubs.

The naive approach I am taking is removing all ways <15m of specific highway types. However, that will remove small segments in the middle of the road and leave larger segments that do not lead anywhere like general highway=service no road network connection.

I started looking into improving the checks with PostGIS: Take the start point, end point > create a buffer > check if roads are part of the buffer (except self) > Only keep those where this is true for both start and end. However, I did not get this query right, yet (code not yet open source).

So ideally, there is a way to specify which stubs I want to remove based on length and road class. Maybe even the class of road that needs to be connected to (consider (or not) a service road a stub if it connects to a footway).

Another thing I noticed is, that this process might need to run multiple times: Once I remove the first stub, there is new one that just as well fits the definition of a stub.

The OSM area at https://www.openstreetmap.org/#map=15/52.3723/13.6252 is a good test ground since it has a lot of stub and smaller road segments.

mboeringa commented 2 years ago

Topic: Remove road stubs – I am processing OSM road data to create a road class network for evaluating bicycle infrastructre. For this, only those roads that are part of a network – as in "they connect to another road" – are interesting. This cleanup becomes more important, the more driveways and footways that lead to houses are mapped.

@tordans ,

To be honest, I think this request is way beyond what osm2pgsql is currently capable of, and what osm2pgsql should become.

I think it is unrealistic - and probably even undesirable in the light of code complexity and maintainability of osm2pgsql - to have osm2pgsql become some sort of "full network topology" suite. There are other excellent open source (PostGIS based) solutions for that based on OpenStreetMap, and PostGIS itself has the "Topology" data type and options.

osm2pgsql should IMO stick to what it is currently doing excellently (and the new flex option enhances): process single OSM objects (nodes, ways, relations) / geometries as fast and efficiently as possible without requiring "network" type knowledge of the data.

The only exception to this, is the already partially enhanced capabilities to deal with "OSM relations" (whether multipolygon or any other type of valid OSM relation is irrelevant here, as they are essentially the same from a technical point of view), as osm2pgsql already has knowledge about relations in order to be able to process them, and relations are just a "single" object from the point of view of OSM.

So any geometries (nodes, ways) contained / referenced in such relation, can be - and already must - be processed as a whole, and specific relation processing is already partially possible in osm2pgsql v1.7.0, e.g., see my attempts to use the new capabilities of v1.7.0 to extract "main_stream" role relation members from OSM "waterway" relations:

https://github.com/openstreetmap/osm2pgsql/discussions/1752

mboeringa commented 2 years ago

@tordans ,

The other problem with this request might be that, as far as I understand the whole processing flow of osm2pgsql up to now, the required information to perform such operations simply isn't there at the stage of import itself:

So these type of operations you are suggesting, can essentially only be performed after the import.

joto commented 2 years ago

@mboeringa You are right, this issue is more about geometry processing in osm2pgsql and what @tordans wants to do is much more ambitious and currently not possible in the osm2pgsql framework. But I have been thinking about similar issues for other use cases I want to support somewhere down the line, for instance generalization of roads for small zoom levels. They have in common that they need some kind of network analysis of the road network that goes beyond simple geometry processing. That's definitely something we can't currently do in osm2pgsql, because osm2pgsql only looks at one object at a time, but it is something that we might be able to do with osm2pgsql and the database together, if we can find the right model of approaching this. It is definitely in scope for the generalization project, so I do want to hear about those use cases, although it would have fit better in a new discussion thread.

mboeringa commented 2 years ago

@joto

Interesting to read the German project description! I would caution you though, to carefully consider what is achievable in the project's funded limited time frame.

Adding "per-object" generalization capability, like the already implemented 'object.simplify()' function, is a kind of no-brainer, but performing high quality generalization (like the one I showed on my SOTM 2022 poster for woodland features of OpenStreetMap), involving potentially thousands of OSM objects being merged into one "representative" object, is far more complicated, especially in the light of the project's aim to support updates as well. E.g. how to keep track of such numbers of ID's of source objects, and even be able automatically update the features in a reliable and performant matter, and to deal with a potential need to subdivide generalized features as well for performance reasons after generalization, seems a potentially daunting task... It is definitely an ambitious target.

My own developed "post-osm2pgsql-processing" routines are definitely not suitable for such "continuous update" type scenario's. In my case, I optimized it for high performance full re-imports, not continuously update-able databases.

By the way, I don't know if you already know it, but you may find Tomas Straupis' work and blog posts for the Lithuanian OSM community and public OSM based Lithuanian topographic maps an interesting read in the light of your project: https://www.openstreetmap.org/user/Tomas%20Straupis/diary Also see the Github repository of his project: https://github.com/openmaplt/vector-map/tree/master/db

kennykb commented 2 years ago

I have been thinking about similar issues for other use cases I want to support somewhere down the line, for instance generalization of roads for small zoom levels. They have in common that they need some kind of network analysis of the road network that goes beyond simple geometry processing. That's definitely something we can't currently do in osm2pgsql, because osm2pgsql only looks at one object at a time, but it is something that we might be able to do with osm2pgsql and the database together, if we can find the right model of approaching this. It is definitely in scope for the generalization project, so I do want to hear about those use cases, although it would have fit better in a new discussion thread.

(Agreed that we probably need a new thread for this... but... sorry, I'm here now.)

One use case tor me has been the placement of highway shields on route concurrencies.  This involves quite a bit of postprocessing at query time (surprisingly, it's not all that slow!) to construct the topology of strings of ways with the same cluster of routes.

You can see the code that I use at https://github.com/kennykb/osm-shieldsshieldtables.lua populates the auxiliary tables, and then the heavy lifting happens in the analyze_merkers SQL function in queryprocs.sql.in. The key to how it works is that it extracts members of the same set of routes (not just the same routes), and then groups them as much as possible with the PostGIS ST_LineMerge function. I use this code regularly for my own work; it's almost certainly NOT ready for any kind of production, but it works well enough for me at the moment.

mboeringa commented 1 year ago

Well, probably not directly relevant to the project and the new generalization options of osm2pgsql, but via the JTS GitHub repository, I stumbled on this project:

https://github.com/micycle1/PGS#morphology

If not for just the beautiful animations of an astounding array of geometric transformations, it might at least inspire some philosophical thinking about generalization options... ;-) Takes some time to load the page, but well worth the wait.