osmcode / osmium-tool

Command line tool for working with OpenStreetMap data based on the Osmium library.
https://osmcode.org/osmium-tool/
GNU General Public License v3.0
483 stars 104 forks source link

Duplicate IDs after merge-changes #262

Closed kochis closed 1 year ago

kochis commented 1 year ago

What version of osmium-tool are you using?

osmium version 1.15.0
libosmium version 2.19.0
Supported PBF compression types: none zlib lz4

What operating system version are you using?

MacOS Ventrua 13.1

Tell us something about your system

Apple M1 Max 64GB RAM

What did you do exactly?

Merged two Daylight OSM changesets into a single file.

osmium merge-changes -v -s ./v1.23/admin-v1.23.osc.bz2 ./v1.23/fb-ml-roads-v1.23.osc.bz2 -o ./build/daylight-sidecars-merged.osm.pbf

Then renumbered them, starting with the largest ID found in the planet dataset.

echo "Determine starting id for renumber..."
osmium fileinfo -e -t node -g data.maxid.nodes ./v1.23/planet-v1.23.osm.pbf > max-id-planet.txt
export MAX_ID=`cat max-id-planet.txt`
export START_ID=$((MAX_ID + 1))

echo "Renumber starting ID is $START_ID"
osmium renumber -v --start-id=$START_ID ./build/daylight-sidecars-merged.osm.pbf -o ./build/daylight-sidecars-renumbered.osm.pbf --overwrite

What did you expect to happen?

Expected the output file to contain the additional data included in the change files.

What did happen instead?

Got an error indicating duplicate Node IDs (and stopped processing the file).

Node ID twice in input. Maybe you are using a history or change file?
This command expects the input file to be ordered: First nodes in order of ID,
then ways in order of ID, then relations in order of ID.

What did you do to try analyzing the problem?

It's likely my misunderstanding with how merge-changes should work, but I assumed any items with duplicate IDs across the files would be merged into the latest occurrence. Is it possible for duplicate IDs across nodes to still exist after merging?

I guess I'm also having trouble understanding the difference between sort, merge, and merge-changes, and how they should be used together for applying changesets.

I'm also sometimes noticing smaller filesizes after running the renumber command, which I assume is due to using smaller IDs, but just wanted to verify that's not a destructive command (it wouldn't drop nodes in any scenario?)

Apologies if this is more of a question than an issue, feel free to close and redirect to the correct place if needed. Thanks.

joto commented 1 year ago

You input files are broken. Or maybe they are just not intended to be merged. Numerous objects with the same id and version number but different data are in both files. You have to take this up with the daylight people.

For instance these, first from the admin file, second from the roads file:

n11655434464 v1 dV c1000000000 t2023-02-04T00:00:00Z i9848585 urapidassist T x39.7447377 y64.5523915
n11655434464 v1 dV c1000000000 t2023-02-04T00:00:00Z i9848585 urapidassist T x-56.6986477 y-30.3889966

This is something that should never happen in normal use of OSM data, so Osmium doesn't detect that case but happily creates broken data from the broken input data.

I would expect renumbered files to be slightly smaller due to the more efficient packing of ids. And no, this operation should never loose data (unless your data is broken to begin with in which case all bets are off). You can check with osmium fileinfo -e whether you have the same number of objects in both files.

kochis commented 1 year ago

Thanks, that's sort of what I was afraid of (I think fb-ml-roads being the main culprit here).

Out of curiosity, how did you find the overlapping nodes in the file?

joto commented 1 year ago

Out of curiosity, how did you find the overlapping nodes in the file?

osmium fileinfo -e shows minimum and maximum ids. I could see there was an overlap, so I write a small pyosmium script that read out all objects in the overlapping range. I then used osmium getid to find those in the other file.

district10 commented 9 months ago

Out of curiosity, how did you find the overlapping nodes in the file?

I saw @joto 's script: https://github.com/osmcode/osmium-tool/issues/197#issuecomment-694242042

osmium cat test_at1_sorted.pbf -f opl | cut -d' ' -f1 | uniq -c | grep -v ' 1 '