Closed snuup closed 2 years ago
Ah, somebody from the future already using version 1.21.1
. :-)
I do see the problem, but I am not sure what the best solution is. (That's why I didn't implement any and why the command is called add-locations-to-ways
and not add-locations-to-ways-and-relations
:-).
Leaving untagged nodes referenced by relations in the output would mean I have to read the input file twice, first to figure out what nodes are referenced, and then do the actual work. That's always going to be a lot more expensive than what we are doing now. And most people will probably not need it, so if that's implemented it will only be an option, not something done by default.
Writing the node locations into the relation somehow would be better really and more in line with what this command is trying to do, but it is difficult for technical reasons: The underlaying libosmium doesn't support it and it would need more changes/extensions to all the OSM file formats. So that's something not done quickly and easily.
Concerning untagged nodes that are never referenced from anything in the file: Those nodes by definition don't represent anything and keeping them would be a very special case which only really makes sense for finding them to fix them. So that's not something I would consider adding to osmium-tool. There is a program in https://github.com/osmcode/osm-data-anomaly-detection which finds them, maybe that's useful to you.
As a workaround you can do something like this (untested):
osmium cat -t relation INPUT.osm -f opl | cut -d' ' -f1,9 | egrep '[M,]n' | cut -d' ' -f1 >relation.ids
to get all relations from the input file, filter out those that have node members and write the ids into a listosmium getid -r INPUT.osm -i relation.ids -f pbf | osmium cat -t node -F pbf -o nodes.osm.pbf
to find all nodes.osmium merge
to combine the nodes.osm.pbf
with the original result from add-locations-to-ways
.If you tell us some more about what you are actually trying to achieve in the end somebody might have a better idea how to solve this.
You can do the workaround even a bit simpler:
osmium cat -t relation INPUT.osm -f opl | cut -d' ' -f9 | egrep '[M,]n' | tr ',' '\n' | grep '^n' | cut -d@ -f1 >node.ids
osmium getid INPUT.osm -i node.ids -o nodes.osm.pbf
osmium add-locations-to-ways INPUT.osm -o with-locations.osm.pbf
osmium merge with-locations.osm.pbf nodes.osm.pbf -o output.osm.pbf
I use my own binary file format "flat-map" and finally adopted the locations on ways concept. This leaves me with those issues and I ponder how to keep flat-map a complete format, as it always was. Producing the file from a location-on-ways pbf proved easy and efficient so want to stick with osmium. Your workaround is nice, I like opl + unix and will use that.
I agree with you that node-locations attached to relations is the nicer design.
Thank you so much Danke
In the future (1.21.1) locations are on ways + rels ;)
unfortunately merge removes the lons and lats on the ways - i assume osmium does not read the lons / lats fields. for now i will leave those nodes just out.
You have to set the output file format with -f pbf,locations_on_ways
on the merge command, so that it will write out the locations on the ways again.
perfect, also had to set -f ... add_metadata=false since my input had that stripped.
real 4m48,502s
user 19m28,849s
nice ;)
I have now implemented a --keep-member-nodes
option to the add-locations-to-ways
command. See 3f2a5815f6e5e836fa6fe6643c536ceab6f5ca82 . This adds some overhead, but it is not huge, something like 600 MB RAM on the planet and a few minutes processing time.
@snuup Do you want to test this?
btw: the command mentioned above does not work correctly.
this will not find all nodes:
osmium cat -t relation INPUT.osm -f opl | cut -d' ' -f9 | egrep '[M,]n' | tr ',' '\n' | grep '^n' | cut -d@ -f1 >node.ids
You need this:
osmium cat -t relation INPUT.osm -f opl | cut -d' ' -f9 | egrep '[M,]n' | cut -c 2- | tr ',' '\n' | grep '^n' | cut -d@ -f1 >node.ids
I realized already that many relation-references are now in but some are still missing (eg node 152) but skipped over that for now. Thank you for continuing on this, I will be back soon and test this then.
big big thanks, this makes it much more easy and you've been very fast with your help.
test result using --keep-member-nodes:
all nodes referenced by relations are included (tagged or tagless) => perfect!
also examined if all remaining tagless nodes are relation-referencees. found 76 nodes that are tagless && not referenced by a relation. first such node is n247051. no issue but noticable.
i tend to switch back to keep node references in relations unresolved, because resolving them departs stronger from the osm datamodel than it does in the ways: the member type would become polymorph: if member.Type = Node then member would have a location as well. that is something to chat about like on an osm saturday however ;)
Node 247051 ist in Relation 280300. Not sure on what data you are operating, @snuup, and what you did to check this. Maybe your check script didn't work correctly. Please open a new issue with details if you are sure there is a bug in osmium tool.
all fine with osmium, fixed a subtle bug in my code. thx.
What version of osmium-tool are you using?
1.21.1
What operating system version are you using?
linux elementary-os
Tell us something about your system
64GB own machine
What did you do exactly?
osmium add-locations-toways planet.osm.pbf -o planet-low.pbf
What did you expect to happen?
Nodes without tags are kept if they are members of any relation, since otherwise those relations break. Actually I hoped that this would happen, the docs however clearly say that all tagless nodes are dropped.
What did happen instead?
Tagless nodes are dropped even when they are referenced by relations. This is in line with the documentation, so this is less a bug than a request to change the specification.
What did you do to try analyzing the problem?
Examined that there are many nodes referenced by relations, 14.414.937 nodes whereof 1.575.234 are tagless.
The relation with the lowest id having a tagless node is relation 2277 , the tagless node 17558792 . https://www.openstreetmap.org/node/17558792
The lowest node id is 152 appearing in relation 1737107 https://www.openstreetmap.org/node/152
I would wish that Nodes 152, 17558792 should, like all nodes referenced in relations, are always included regardless of the --keep-untagged-nodes switch.
Alternatively, tagless nodes could be added to relations similar to ways. That would equally work for me, but I prefer keeping the nodes in the nodes section. This is in line with another suggestion, to keep all tagless nodes that are neither referenced by ways nor by relations. While those nodes are presumably errors, for my use case I would like to keep them.