Closed joto closed 2 weeks ago
Since it is getting a bit confusing probably with the multiple issues/PRs - we have for using the flex backend:
I am fine with taking the more conservative approach as outlined here - although many of the further reaching changes envisioned in #4112 (in particular the route relations) are highly desirable and there is agreement about them (and these changes are, frankly, the main point for us to move to the flex backend). The primary point of disagreement that remained in #4431 was the introduction of new transport tables.
The main consideration from my side on the whole matter is that we want to use a stable framework for our database import and avoid depending on features that are likely to change in the next versions of osm2pgsql in a way that requires changes.
Regarding indices: I don't have a firm position on this. But we should keep in mind that - while so far the indices are the only custom code we run on the database outside of osm2pgsql this is not necessarily going to stay so. In #4952 we are discussing introducing custom functions and we have in that context also considered using generated columns. OSM-Carto derivatives use other database structures (functions, views, additional tables). And it is not realistic that all of this can be handled by osm2pgsql.
Strategically i would like to see us staying close to the OSM data model in our database layout. This makes it much more strait away for map designers to do styling work under the goals we have and for derivative styles to use an identical/compatible database layout. That would mean decidedly not going the route @SomeoneElseOSM did with moving a lot of style specific logic like tag interpretation into lua code.
I was unable to develop consensus on the change to flex as there was opposition to using any parts of a more modern layout than the historical planet_osm_points/line string/polygon.
Reading the above, I don't see that any of that has changed.
@pnorman That's why I restarted this effort in the way I did. The move to flex is needed independently of any change in layout. What I am proposing here basically changes nothing for OSM Carto developers, it just allows Carto to keep up with the changing osm2pgsql. We have to remove the old code in osm2pgsql if we want to keep improving it, so all projects still using the legacy pgsql output have to move to flex.
As said the only clearly unresolved issue with #4431 was the introduction of new 'transport' tables. We have had a good discussion on this back then IMO but no final conclusion.
Anyway - @joto here suggests to separate the formal move to the flex backend from the database layout changes. While this is in a way a step back from #4431 it would have two advantages:
Personally i would have preferred if we could have concluded the discussion in #4431 with a clear consensus on strategy and a new database layout reflecting that. But things are as they are. And going with the approach suggested by @joto does not prevent us from approaching this again afterwards.
This is addressed by #4978 - follow-up issues, like the matter of preprocessing the layer column, are organized in #5027.
Only noticed this now somehow. Doesn't really affect me that much as I am running two osm2pgsql import instances in parallel, one using the "hstore-all, hstore-only" import with views on top from openstreetmap-carto-de; and one using -- currently -- openstretmap-carto v5.8 for OSM Carto and Olivers "Baumkarte" only so far.
I'm in the process of upgrading this to v5.9 today, and will keep in mind that the next upgrade will require a fresh import.
Osm2pgsql has been moving away from the old "pgsql" output for years now. The new output can do everything the old code can do and much much more. All new development is there, the old code will not get any new features. The OSM Carto project is the last major user of the "pgsql" output.
We want to get rid of the "pgsql" output in osm2pgsql at some point, which allows us to simplify osm2pgsql internally. This will not happen tomorrow, we'll leave plenty of time for OSM Carto and other users to switch. But we have to get started on moving installations over to the flex output.
Advantages of the switch include:
Instead of the
openstreetmap-carto.style
andopenstreetmap-carto.lua
config files there is now a single config fileopenstreetmap-carto-flex.lua
. The command line for osm2pgsql will change to use the flex output and the new config file. Everything else should be pretty much the same. The database layout is 100% compatible. No changes to the styles or SQL queries are needed.Updates are totally seemless. You can keep an existing database created with the pgsql output and keep updating it now with the new flex-based configuration.
The two versions of the config files can be used side-by-side for a while if that's what OSM Carto maintainers want. The documentation can explain both options. Or we can switch over at some point.
Osm2pgsql version needed
You need at least version 1.8.0 of osm2pgsql which is available in Debian Stable, Ubuntu 24.04 has version 1.11.0.
Command line
The command line used will change. Only the output type (
-O flex
) and the config file have to be set.Old command line (from
INSTALL.md
):New command line:
Changes in database layout
The database layout have very little changes. The id columns (
osm_id
) and geometry columns (way
) on all tables will get the NOT NULL flag when using the flex output. These have always been NOT NULL in practice anyway, so this isn't a problem.Indexes
Currently several custom indexes have to be generated after import, see the
indexes.yml
andindexes.sql
files.The flex output can be configured to create those indexes. This means we can get rid of some more of the config files and the
scripts/indexes.py
script. If osm2pgsql is configured to create those indexes it will do so after the import is finished, running several CREATE INDEX commands concurrently (how many depends on command line options).Open issues:
planet_osm_polygon_way_idx3
instead ofplanet_osm_polygon_way_area_z10
. A change for osm2pgsql to allow setting the name is being worked on.fillfactor
on the "main" geometry index is not set any more. For some background see https://github.com/osm2pgsql-dev/osm2pgsql/issues/1780 .Question: Do we want to keep the old way of generating indexes or let osm2psqgl handle them? We can also make this optional in some way, having a flag in the config file that will trigger creation of the indexes.
Changes in database content
The content of the resulting tables look the same as before. The only exception is that in some cases rounding for the
way_area
column is different, so you'll get slightly different values. This should not affect the use in any major way.Tags named
z_order
are handled slightly different, but those tags are bogus anyway and this should not have any effect. (I removed allz_order
tags from the planet a few days ago now anyway...)The old setup would allow objects with a
layer
tag and either no other tags or only tags that are ignored (such asfixme
) to show up as database entries with all columns NULL or empty. This is no longer the case.I have verified that the resulting database is the same by running both old and new configurations side by side on all of the planet data and not seen any differences beyond those described above.
Setting layer column
Most tags are used "as is" in their respective database columns. An exception is the
layer
which is an integer column. It gets some special treatment in the Lua code. The current code does the same as before, but it doesn't have to.It would be a small change to use layer
0
instead ofNULL
when the layer is not set. This would allow the SQL queries to be simplified a little bit: We don't needCOALESCE(layer,0)
any more which is used in several places.We'd probably want to keep the SQL code as it is for now, so users are not forced to re-import.
Themepark spport
Themepark is a framework for writing osm2pgsql Lua configs. It allows mixing several configurations so that one database can support several different table layouts and use cases at the same time.
The OSM Carto configuration is written in a way that it can be used with or without the Lua framework. Using it without the framework is just as easy as with the pgsql output before, you just specify the Lua config file on the command line as described above.
If you want to use it with the framework the setup is slightly more involved, but you have the advantage that you can then have tables of different layout in the same database.
Performance
From my measurements performance is about 20% to 25% better than before. I have measured this by importing various planet extracts without the
--slim
option and without creating all the extra indexes. Because index creation takes a lot of time, numbers will not be as good with--slim
and the indexes.Open Question: Derived styles
Some styles are derived from OSM Carto, such as OSM Carto Germany. How are these affected? What can we do to make life easier for these kind of styles?
@giggls @hholzgra
History
The changes proposed here are based on the efforts started by @pnorman in #4112 (see also the PR #4431). Those efforts have stalled since. One reason, I believe, was that those efforts switched not only from the "pgsql" to the "flex" output, but contained also other changes. That's why this change goes to quite some lengths to keep everything as compatible as possible.
Thank you, Paul, for starting this effort so many years ago. I used your code as a starting point, but there are a lot of changes due to my more limited goal, changes in osm2pgsql since then, and some performance improvements.