osmcode / osmium-tool

Command line tool for working with OpenStreetMap data based on the Osmium library.
https://osmcode.org/osmium-tool/
GNU General Public License v3.0
483 stars 104 forks source link

add-locations-to-ways should always keep all untagged nodes that are referenced by relations #239

Closed snuup closed 2 years ago

snuup commented 2 years ago

What version of osmium-tool are you using?

1.21.1

What operating system version are you using?

linux elementary-os

Tell us something about your system

64GB own machine

What did you do exactly?

osmium add-locations-toways planet.osm.pbf -o planet-low.pbf

What did you expect to happen?

Nodes without tags are kept if they are members of any relation, since otherwise those relations break. Actually I hoped that this would happen, the docs however clearly say that all tagless nodes are dropped.

What did happen instead?

Tagless nodes are dropped even when they are referenced by relations. This is in line with the documentation, so this is less a bug than a request to change the specification.

What did you do to try analyzing the problem?

Examined that there are many nodes referenced by relations, 14.414.937 nodes whereof 1.575.234 are tagless.

The relation with the lowest id having a tagless node is relation 2277 , the tagless node 17558792 . https://www.openstreetmap.org/node/17558792

The lowest node id is 152 appearing in relation 1737107 https://www.openstreetmap.org/node/152

I would wish that Nodes 152, 17558792 should, like all nodes referenced in relations, are always included regardless of the --keep-untagged-nodes switch.

Alternatively, tagless nodes could be added to relations similar to ways. That would equally work for me, but I prefer keeping the nodes in the nodes section. This is in line with another suggestion, to keep all tagless nodes that are neither referenced by ways nor by relations. While those nodes are presumably errors, for my use case I would like to keep them.

joto commented 2 years ago

Ah, somebody from the future already using version 1.21.1. :-)

I do see the problem, but I am not sure what the best solution is. (That's why I didn't implement any and why the command is called add-locations-to-ways and not add-locations-to-ways-and-relations :-).

Leaving untagged nodes referenced by relations in the output would mean I have to read the input file twice, first to figure out what nodes are referenced, and then do the actual work. That's always going to be a lot more expensive than what we are doing now. And most people will probably not need it, so if that's implemented it will only be an option, not something done by default.

Writing the node locations into the relation somehow would be better really and more in line with what this command is trying to do, but it is difficult for technical reasons: The underlaying libosmium doesn't support it and it would need more changes/extensions to all the OSM file formats. So that's something not done quickly and easily.

Concerning untagged nodes that are never referenced from anything in the file: Those nodes by definition don't represent anything and keeping them would be a very special case which only really makes sense for finding them to fix them. So that's not something I would consider adding to osmium-tool. There is a program in https://github.com/osmcode/osm-data-anomaly-detection which finds them, maybe that's useful to you.

As a workaround you can do something like this (untested):

  1. Use osmium cat -t relation INPUT.osm -f opl | cut -d' ' -f1,9 | egrep '[M,]n' | cut -d' ' -f1 >relation.ids to get all relations from the input file, filter out those that have node members and write the ids into a list
  2. Use osmium getid -r INPUT.osm -i relation.ids -f pbf | osmium cat -t node -F pbf -o nodes.osm.pbf to find all nodes.
  3. Use osmium merge to combine the nodes.osm.pbf with the original result from add-locations-to-ways.

If you tell us some more about what you are actually trying to achieve in the end somebody might have a better idea how to solve this.

joto commented 2 years ago

You can do the workaround even a bit simpler:

osmium cat -t relation INPUT.osm -f opl | cut -d' ' -f9 | egrep '[M,]n' | tr ',' '\n' | grep '^n' | cut -d@ -f1 >node.ids
osmium getid INPUT.osm -i node.ids -o nodes.osm.pbf
osmium add-locations-to-ways INPUT.osm -o with-locations.osm.pbf
osmium merge with-locations.osm.pbf nodes.osm.pbf -o output.osm.pbf
snuup commented 2 years ago

I use my own binary file format "flat-map" and finally adopted the locations on ways concept. This leaves me with those issues and I ponder how to keep flat-map a complete format, as it always was. Producing the file from a location-on-ways pbf proved easy and efficient so want to stick with osmium. Your workaround is nice, I like opl + unix and will use that.

I agree with you that node-locations attached to relations is the nicer design.

Thank you so much Danke

In the future (1.21.1) locations are on ways + rels ;)

snuup commented 2 years ago

unfortunately merge removes the lons and lats on the ways - i assume osmium does not read the lons / lats fields. for now i will leave those nodes just out.

joto commented 2 years ago

You have to set the output file format with -f pbf,locations_on_ways on the merge command, so that it will write out the locations on the ways again.

snuup commented 2 years ago

perfect, also had to set -f ... add_metadata=false since my input had that stripped.

real     4m48,502s
user    19m28,849s

nice ;)

joto commented 2 years ago

I have now implemented a --keep-member-nodes option to the add-locations-to-ways command. See 3f2a5815f6e5e836fa6fe6643c536ceab6f5ca82 . This adds some overhead, but it is not huge, something like 600 MB RAM on the planet and a few minutes processing time.

@snuup Do you want to test this?

joto commented 2 years ago

btw: the command mentioned above does not work correctly.

this will not find all nodes: osmium cat -t relation INPUT.osm -f opl | cut -d' ' -f9 | egrep '[M,]n' | tr ',' '\n' | grep '^n' | cut -d@ -f1 >node.ids

You need this: osmium cat -t relation INPUT.osm -f opl | cut -d' ' -f9 | egrep '[M,]n' | cut -c 2- | tr ',' '\n' | grep '^n' | cut -d@ -f1 >node.ids

snuup commented 2 years ago

I realized already that many relation-references are now in but some are still missing (eg node 152) but skipped over that for now. Thank you for continuing on this, I will be back soon and test this then.

snuup commented 2 years ago

big big thanks, this makes it much more easy and you've been very fast with your help.

test result using --keep-member-nodes:

all nodes referenced by relations are included (tagged or tagless) => perfect!

also examined if all remaining tagless nodes are relation-referencees. found 76 nodes that are tagless && not referenced by a relation. first such node is n247051. no issue but noticable.

i tend to switch back to keep node references in relations unresolved, because resolving them departs stronger from the osm datamodel than it does in the ways: the member type would become polymorph: if member.Type = Node then member would have a location as well. that is something to chat about like on an osm saturday however ;)

joto commented 2 years ago

Node 247051 ist in Relation 280300. Not sure on what data you are operating, @snuup, and what you did to check this. Maybe your check script didn't work correctly. Please open a new issue with details if you are sure there is a bug in osmium tool.

snuup commented 2 years ago

all fine with osmium, fixed a subtle bug in my code. thx.