zerebubuth / planet-dump-ng

Converts an OpenStreetMap database dump into planet files.
BSD 2-Clause "Simplified" License
30 stars 8 forks source link

Planet state file #6

Open zerebubuth opened 9 years ago

zerebubuth commented 9 years ago

It's a pain to figure out which replication state.txt file corresponds to a given planet and, now that all the current & history for both XML and PBF correspond to the same state, it would make it easier if the dump process or script would figure out the state from which replication could continue.

The dump already tracks the last timestamp in the file, and this can be used to find a state file. But there might be in-progress transactions at that point, so it will be necessary to track backwards in the state files until before all those transactions start.

ssipos90 commented 3 years ago

Hi, Sorry for hijacking your thread, but it's kind of related.

For the life of me if I can figure this out.

As a background, to understand my approach, we're building a private OSM server and a tiles server, part of a bigger app. With the intention of having a small DB and new clients as up-to-date as possible with the main OSM server, we decided to import only the client's bits, not the whole country. When a new client joins, we re-download the country.pbf, slice his bit and import it without impacting our other clients (I think). Clients will edit the map using iD so I've setup a tiles server in sync. Replication using osmdbt is done and I'm currently working on importing the chages using imposm or osm2pgsql and here is the tricky bit.

I can't manage to generate the correct state.txt. Using osmium fileinfo on the generated PBF shows the latest change's timestamp, but no sequenceNumber.

To summarise, when a new client joins:

I'd appreciate some help, thanks :)

zerebubuth commented 3 years ago

The planet-dump-ng software only sets the current time in the PBF header, not the sequence number. This is because the planet dump is an independent process from the replication diffs and neither depends on the other. Also, there are minutely, hourly and daily replication streams and each has a different (independent) sequence number.

There are tools to synchronise a planet dump with a chosen replication stream, for example pyosmium's up-to-date tool. This works by looking at the timestamp of the planet file, rewinding a bit and replaying the diffs covering that period.

The general reason why these streams are all independent is that it previously wasn't easy to identify a linear point in time in Postgres, hence all the stuff in Osmosis' state file about txnActiveList and the xid column index in the database. More recently, Postgres made it easier to get access to the internals of the replication log, which made more robust tools like osmdbt possible and allows talking about a specific linear point in the log.

In summary; planet-dump-ng won't write the sequence number header in PBF files, you'll have to use something else (e.g: pyosmium-up-to-date) to merge replication stream info into the planet file.

Hope that helps!

ssipos90 commented 3 years ago

it does, thanks for the explanation.

Edit: technically, I'd rather reset the sequence numbers every time

ssipos90 commented 3 years ago

Hi,

I wanted to contribute to your project so I'm pasting our dockerfiles here. Maybe you guys need it. Postgres version can be bumped to 12 without any hiccups, we're just not there and haven't tested it.

I removed line, might hiccup at permissions on the volume but I don't think so.

Dockerfile:

FROM debian:buster-slim

ARG PLANET_DUMP_URL=https://github.com/zerebubuth/planet-dump-ng/archive/v1.2.0.tar.gz

RUN set -eu; \
    apt-get update; \
    apt-get install -y --no-install-recommends \
      build-essential \
      autoconf \
      automake \
      ca-certificates \
      curl \
      libboost-date-time-dev \
      libboost-dev \
      libboost-filesystem-dev \
      libboost-iostreams-dev \
      libboost-program-options-dev \
      libboost-thread-dev \
      libosmpbf-dev \
      libprotobuf-dev \
      libxml2-dev \
      osmpbf-bin \
      pkg-config \
      postgresql-client-11; \
    useradd -u 999 -r planetdump; \
    mkdir /opt/build; \
    curl -sL $PLANET_DUMP_URL | tar xz -C /opt/build --strip-components=1; \
    cd /opt/build; \
    ./autogen.sh; \
    ./configure; \
    make -j $(nproc); \
    make install; \
    cd /; \
    rm -rf /opt/build; \
    mkdir /dumps; \
    chown planetdump:planetdump /dumps

COPY entrypoint /usr/local/bin/entrypoint
VOLUME /dumps
USER planetdump
WORKDIR /dumps
ENTRYPOINT ["/usr/local/bin/entrypoint"]
CMD ["bash"]

entrypoint (chmod +x)

#!/bin/sh
set -eu

PBF_FILE=${PBF_FILE:-latest.pbf}

case "$1" in
  dump)
    cd /dumps
    rm -rf users changeset* node* way* relation*
    echo "dumping OSM db"
    DUMP_FILE=$(mktemp)
    pg_dump -F custom > $DUMP_FILE
    echo "creating PBF"
    planet-dump-ng -f $DUMP_FILE -p "$PBF_FILE"
    rm -rf users changeset* node* way* relation*
  ;;
  *) exec "$@";;
esac

Edit: added missing file name, removed osmium.