rustprooflabs / pgosm-flex

PgOSM Flex provides high quality OpenStreetMap datasets in PostGIS (Postgres) using the osm2pgsql Flex output.
MIT License
100 stars 20 forks source link

Improve `osm_date` to report date from PBF metadata #388

Open rustprooflabs opened 3 months ago

rustprooflabs commented 3 months ago

Details

Current Status

Version 1.0.2

Version TBD

Description

The osm.pgosm_flex table tracks the osm_date column which is intended to provide an idea of when the data itself was from. The current behavior defaults to "today's date" according to the computer running the import by default. Unless the --pgosm-date option is used, then it uses the date provided (e.g. 2024-05-18). Neither of these options is perfect, due to time zones and actual differences between when "I downloaded the file" vs "when the data was pulled from OSM."

I'd like to make the following changes.

Example in DB

I ran an import on 5/18/2024 local time. The data saved in the osm.pgosm_flex table is shown by the following query.

SELECT imported, osm_date, region, pgosm_flex_version
    FROM osm.pgosm_flex
;
imported                     |osm_date  |region                   |pgosm_flex_version|
-----------------------------+----------+-------------------------+------------------+
2024-05-18 08:25:03.747 -0600|2024-05-18|north-america/us-colorado|1.0.0-c946501     |

PBF metadata

The timestamp from the pbf's metadata is reported as 2024-05-17T20:20:59Z, which is 2024-05-17T14:20:59 MDT local time for me. The date reported in the current method is reported to be a day later when the data was actually sourced.

We should be able to run this command returning the JSON into python as a dict to extract the timestamp and/or osmosis_replication_timestamp keys.

osmium fileinfo district-of-columbia-2024-05-18.osm.pbf --json
{
    "file": {
        "name": "district-of-columbia-2024-05-18.osm.pbf",
        "format": "PBF",
        "compression": "none",
        "size": 19026604
    },
    "header": {
        "boxes": [
            [
                -77.1201,
                38.79134,
                -76.90906,
                38.99603
            ]
        ],
        "with_history": false,
        "option": {
            "generator": "osmium/1.14.0",
            "osmosis_replication_base_url": "http://download.geofabrik.de/north-america/us/district-of-columbia-updates",
            "osmosis_replication_sequence_number": "4066",
            "osmosis_replication_timestamp": "2024-05-17T20:20:59Z",
            "pbf_dense_nodes": "true",
            "pbf_optional_feature_0": "Sort.Type_then_ID",
            "sorting": "Type_then_ID",
            "timestamp": "2024-05-17T20:20:59Z"
        }
    }
}
jacopofar commented 3 months ago

Sounds good!