Closed rustprooflabs closed 1 year ago
To make this work for road
without calling it a breaking change it needs to support per-table customization. The road_line
table has an index on the major
column. The road_point
table does not have that column.
Per table customization (not per layer) is also needed for gist/spgist.
With the latest commit (https://github.com/rustprooflabs/pgosm-flex/commit/4b05a27407f156899a0c7ecada1a6b033abe3db2 not showing up in this PR yet?) this feature is essentially functional. The new docs page has the basic examples and details. A couple folders with custom index schemes are provided. The indexes/examples/noindexes/*
example creates tables with only the primary keys. The lotsofindexes
example indexes nearly every column.
Needs some cleanup, and removing of the original SP-GIST code. Plus improved documentation.
Rebased and force pushed after fixing #345.
This PR captures everything outlined by #286. The best explanation of the work here is outlined in the new custom indexes page in the docs. That link is the MD here in GH. It'll be live on https://pgosm-flex.com when merged into main
.
This PR does not preserve the exact indexes created in 0.9.0 and prior. As I started looking at recreating exactly what was there, I realized many of the indexes don't make much sense as defaults. This PR standardizes and simplifies the default indexes.
With this in place, I want to to look at timing differences and database size differences between default indexes (geom
and osm_type
), examples/noindexes
, and examples/lotsofindexes
.
I'm not planning on writing a migration script for dropping / changing indexes. The indexes are only defined during initial table creation. Current users of --replication
can evaluate their indexes and decide on a) leaving everything as is, b) manually drop unwanted indexes, or c) Start over with a fresh import. Now that it's possible (and hopefully easy!) to create custom indexes, it's a good time for users to evaluate their performance needs and potential indexes.
Rebased to main
D.C. indexes, 0.9.0. 41 seconds.
┌────────┬─────────────┬─────────────┬────────────┐
│ s_name │ table_count │ index_count │ index_size │
╞════════╪═════════════╪═════════════╪════════════╡
│ osm │ 42 │ 151 │ 35 MB │
└────────┴─────────────┴─────────────┴────────────┘
After, 40 seconds. 27 fewer indexes.
┌────────┬─────────────┬─────────────┬────────────┐
│ s_name │ table_count │ index_count │ index_size │
╞════════╪═════════════╪═════════════╪════════════╡
│ osm │ 42 │ 124 │ 34 MB │
└────────┴─────────────┴─────────────┴────────────┘
After convincing myself this wasn't a breaking change, it obviously is. This removes the --sp-gist
option.
Initial PoC to enable configuring which indexes users want. RE #286.
This illustrates how indexes can be defined via an
.ini
file. To test, checkout this branch and docker build a local image with the changes. The test is in the place tables, there is a--layerset=place
that exists now that will be deleted before merging.Notes on current path
Currently putting a lot of logic in
place.lua
. Need to work on generalizing this into a function in the helpers module. Ultimately this should result in less Lua code per layer with indexes defined in an INI and parsed by a common helper.spatial indexes
Always creates spatial index (gist or spgist). Currently the decision of gist/spgist is per layer, so setting for place sets for point/line/polygon. Not ideal, but a huge improvement over having that setting defined for all tables! As the logic becomes more reusable, setting gist/spgist per table should be possible. There's no reason why there should be a forced gist index other than that's the way it is right now.
primary keys
I don't think the primary keys can be created directly via osm2pgsql flex at this time. That means the PK index is still only controlled by post-processing SQL. Also,
osm.place_polygon_nested
is hard coded in post-processing SQL.Next steps
Put logic to build the
indexes
table into a function. This should include support for the most commonly used columns throughout PgOSM Flex layers. This is a rough starting point of what might be good candidates. Suggestions welcome!Many of the indexes indicated above will likely be partial indexes such as the
name
column index.Customization is expected to be enabled much like custom layersets. Map a volume to a path, and tell PgOSM Flex to look there.