osm2pgsql-dev / osm2pgsql

OpenStreetMap data to PostgreSQL converter
https://osm2pgsql.org
GNU General Public License v2.0
1.5k stars 474 forks source link

Support for TWKB output #1085

Open giggls opened 4 years ago

giggls commented 4 years ago

TWKB is a multi-purpose format for serializing vector geometry data into a byte buffer, with an emphasis on minimizing size of the buffer. The format is already supported by PostGIS.

See https://github.com/TWKB/Specification/blob/master/twkb.md for details.

It is somewhat similar to PBF format.

While this will impact the database import and query speed in some way it will help to reduce the size of production tables in a significant way.

Importing a small extract of the planet I was able to reduce the size of the standard production tables by 36.3 % using seven digits after the decimal point, which should be the original resolution of the imported OSM data.

Here are the results of calling pg_relation_size on them:

 twkb_osm_point | planet_osm_point                                                                                                          
----------------+-----------------
 77 MB          | 86 MB
(1 Zeile)

 twkb_osm_line | planet_osm_line 
---------------+----------------
 98 MB         | 168 MB
(1 Zeile)

 twkb_osm_polygon | planet_osm_polygon 
------------------+-------------------
 226 MB           | 377 MB
(1 Zeile)

 twkb_osm_roads | planet_osm_roads 
----------------+-----------------
 16 MB          | 24 MB
(1 Zeile)

All tables:

----------------+-----------------
 417 MB         | 655 MB

A TWKB output option might well make sense using the generic backend as well as the new flex backend.

joto commented 1 year ago

I have done a lot of refactoring on the code that creates the geometries for PostgreSQL/PostGIS in the last months. It should now be relatively straightforward to add in code for generating TWKB instead of the default EWKB. But it's not something high on anybody's priority list and it's unclear whether it is worth the extra complexity.

If somebody wants to tackle this: Look for the geom_to_ewkb() function and a a geom_to_twkb() which does basically the same thing only for TWKB. Then add some kind of flag so the user can choose which one is called from the Lua config. Please talk to us here first, though, before you spend a lot of time on it.

giggls commented 1 year ago

I have been using ST_AsTWKB function from PostGIS to generate the tables for my example. Maybe these can be re-used.

TWKB will only make sense for an epsg:4326 version of the database.

If I correctly understand the geom_to_ewkb function correctly, the geometry is passed in geom format which I assume would be floating-point, right?

I don't know if converting fixed-point raw osm to floating point and back to fixed-point (for twkb) will work well.

joto commented 1 year ago

Unfortunately you can not use ST_AsTWKB() with osm2pgsql, because we are using COPY to import the data which doesn't allow putting a function in there. You could do something with a trigger though. If PostGIS would simply allow "magic" conversion of the normal geometry type to twkb it would simply work. I don't see any reason why that wouldn't be possible but PostGIS doesn't allow it.

Yes, the geometry classes in osm2pgsql use double precision floating point values for the coordinates. That's what you have to work with.

rustprooflabs commented 1 year ago

If PostGIS would simply allow "magic" conversion of the normal geometry type to twkb it would simply work.

Would it be possible to define a generated column via the Lua style? If we could create generated columns it could be defined with ST_AsTWKB(). Of course that duplicates the geometry in the database but I already do that in a variety of scenarios because the benefit is often worth that storage overhead.

giggls commented 1 year ago

The sole reason for using TWKB is to reduce table size thus adding an additional column would invalidate using TWKB at all.