osm2pgsql-dev / osm2pgsql

OpenStreetMap data to PostgreSQL converter
https://osm2pgsql.org
GNU General Public License v2.0
1.5k stars 474 forks source link

[Flex Output] Adding customized column with create_only = true does not work if tables are created with flag cluster = 'auto' (not documented) #2258

Closed fefux closed 1 month ago

fefux commented 1 month ago

What version of osm2pgsql are you using?

osm2pgsql version 2.0.0 (2.0.0) Build: RelWithDebInfo Compiled using the following library versions: Libosmium 2.20.0 Proj 8.2.1 Lua 5.1.4 (LuaJIT 2.1.0-beta3)

What operating system and PostgreSQL/PostGIS version are you using?

Distributor ID: Ubuntu Description: Ubuntu 22.04.5 LTS Release: 22.04 Codename: jammy

Tell us something about your system

Bare metal hp server 250gb of ram 24 cpu

What did you do exactly?

I tried to use flex output to customize import table by adding column to enumerate rows. I tried with the file flex-config/compatible.lua I changed the function gen_columns and add this line :

local function gen_columns(text_columns, with_hstore, area, geometry_type)
    local columns = {}

    local add_column = function (name, type)
        columns[#columns + 1] = { column = name, type = type }
    end

   columns[#columns + 1] = { column = 'lineno', sql_type = 'serial', create_only = true }
...

I ran osm2pgsql with : osm2pgsql -c -d map -U gisuser -H localhost -S ../test/osm2pgsql/test.lua -O flex -v france.osm.pbf

output :

2024-09-30 07:45:38  osm2pgsql version 2.0.0 (2.0.0)
2024-09-30 07:45:38  [00] Database version: 14.13 (Ubuntu 14.13-0ubuntu0.22.04.1)
2024-09-30 07:45:38  [00] PostGIS version: 3.2
2024-09-30 07:45:38  [00] Found properties table 'osm2pgsql_properties': true.
2024-09-30 07:45:38  [00] Reading file: france.osm.pbf
2024-09-30 07:45:38  [00] Started pool with 4 threads.
2024-09-30 07:45:38  [00] ExpireOutputs:
2024-09-30 07:45:38  [00] Tables:
2024-09-30 07:45:38  [00] - Table "public"."pzd_osm_point"
2024-09-30 07:45:38  [00]   - columns:
2024-09-30 07:45:38  [00]     - "osm_id" id_num (int8) not_null=true create_only=false
2024-09-30 07:45:38  [00]     - "lineno" text (serial) not_null=false create_only=true <-----Here
2024-09-30 07:45:38  [00]     - "access" text (text) not_null=false create_only=false

What did you expect to happen?

As my customized column is listed in the log above :

What did happen instead?

The column is not created at all as the SQL create table command just drops it without any message

What did you do to try analyzing the problem?

I ran osm2pgsql with : osm2pgsql -c -d map -U gisuser -H localhost -S ../test/osm2pgsql/test.lua -O flex --log-sql france.osm.pbf output :

...
2024-09-30 07:48:00  SQL: (C1) DROP TABLE IF EXISTS "public"."pzd_osm_point" CASCADE
2024-09-30 07:48:00  SQL: (C1) DROP TABLE IF EXISTS "public"."pzd_osm_point_tmp" CASCADE
2024-09-30 07:48:00  SQL: (C1) CREATE UNLOGGED TABLE IF NOT EXISTS "public"."pzd_osm_point" ("osm_id" int8 NOT NULL,"access" text ,"addr:housename" text ,"addr:housenumber" text ,"addr:interpolation" text ,"admin_level" text ,"aerialway" text ,"aeroway" text ,"amenity" text ,"area" text ,"barrier" text ,"bicycle" text ,"brand" text ,"bridge" text ,"boundary" text ,"building" text ,"capital" text ,"construction" text ,"covered" text ,"culvert" text ,"cutting" text ,"denomination" text ,"disused" text ,"ele" text ,"embankment" text ,"foot" text ,"generator:source" text ,"harbour" text ,"highway" text ,"historic" text ,"horse" text ,"intermittent" text ,"junction" text ,"landuse" text ,"layer" text ,"leisure" text ,"lock" text ,"man_made" text ,"military" text ,"motorcar" text ,"name" text ,"natural" text ,"office" text ,"oneway" text ,"operator" text ,"place" text ,"population" text ,"power" text ,"power_source" text ,"public_transport" text ,"railway" text ,"ref" text ,"religion" text ,"route" text ,"service" text ,"shop" text ,"sport" text ,"surface" text ,"toll" text ,"tourism" text ,"tower:type" text ,"tunnel" text ,"water" text ,"waterway" text ,"wetland" text ,"width" text ,"wood" text ,"z_order" int4 ,"way" Geometry(POINT, 3857) NOT NULL) WITH (autovacuum_enabled = off)

I dig deeper and found that piece of code in file flex-table.cpp :

std::string
flex_table_t::build_sql_create_table(table_type ttype,
                                     std::string const &table_name) const
{
    assert(!m_columns.empty());

    std::string sql =
        fmt::format("CREATE {} TABLE IF NOT EXISTS {} (",
                    ttype == table_type::interim ? "UNLOGGED" : "", table_name);

    util::string_joiner_t joiner{','};
    for (auto const &column : m_columns) {
        // create_only columns are only created in permanent, not in the
        // interim tables
        if (ttype == table_type::permanent || !column.create_only()) {
            joiner.add(column.sql_create());
        }
    }
...

And I checked what defined a permanent table instead of an interim table. I found in the same file :

void table_connection_t::start(pg_conn_t const &db_connection,
                               bool append) const
{
    if (!append) {
        drop_table_if_exists(db_connection, table().schema(), table().name());
    }

    // These _tmp tables can be left behind if we run out of disk space.
    drop_table_if_exists(db_connection, table().schema(),
                         table().name() + "_tmp");

    if (!append) {
        db_connection.exec(table().build_sql_create_table(
            table().cluster_by_geom() ? flex_table_t::table_type::interim
                                      : flex_table_t::table_type::permanent,
            table().full_name()));

        enable_check_trigger(db_connection, table());
    }

    table().prepare(db_connection);
}

And then, in flex-table.hpp :

bool cluster_by_geom() const noexcept
    {
        return has_geom_column() && m_cluster_by_geom;
    }

And concluded by : If the table contains a geometry column and has the flag cluster to true, it's an interim table and create_only column will be dropped from the create table.

To fix the issue, in the lua file, you need to set :

tables.point = osm2pgsql.define_table{
    name = prefix .. '_point',
    ids = { type = 'node', id_column = 'osm_id' },
    columns = gen_columns(point_columns, hstore or hstore_all, false, 'point'),
    cluster = 'no' ----> This line
}

With the flag cluster to 'no' ('auto' by default)

I hope it will be clear enough ;)

joto commented 1 month ago

Works for me.

You might have looked at the temporary tables while the import is running. The temporary tables don't have the create_only columns in them. But after the import is done, the temporary tables will be sorted and copied into the final tables. And those have the create_only columns. This is the expected behaviour and needed for some situations (see #1522).

fefux commented 1 month ago

Oups, my bad... As I couldn't see my column in the table, I didn't wait the whole import (~1h)... I retry now