osmcode / osmium-tool

Command line tool for working with OpenStreetMap data based on the Osmium library.
https://osmcode.org/osmium-tool/
GNU General Public License v3.0
509 stars 107 forks source link

Geojson export not exporting all features #186

Closed hyperknot closed 4 years ago

hyperknot commented 4 years ago

In the following reproducible scenario, osmium export is not including relation=2978650 in the GeoJSON export, even though it is present in the PBF/OSM file. (mainland Norway, admin_level=2)

osmtogeojson working on the same OSM file includes that relation.

wget https://download.geofabrik.de/europe/norway-latest.osm.pbf

osmium tags-filter norway-latest.osm.pbf r/boundary=administrative -o out/admin.pbf
osmium tags-filter out/admin.pbf r/admin_level=2 -o out/level2.osm

osmium cat out/level2.osm -f opl | grep 2978650 | wc -l   # output: 1
osmium export out/level2.osm -o out/osmium.geojson
cat out/osmium.geojson | grep 2978650 | wc -l   # output: 0

osmtogeojson out/level2.osm > out/osmtogeojson.geojson
cat out/osmtogeojson.geojson | grep 2978650 | wc -l   # output: 2

osmtogeojson was installed by npm -g install osmtogeojson

version is 1.11.1

joto commented 4 years ago

Ids are not included in the output of osmium export by default. So grepping for them doesn't work. You can use a config file to add ids.

I tested just the relation downloaded from OSM and it is exported with osmium export. It might not look like what you expect though, because subareas are not handled by Osmium.

I then checked the relation 2978650 in the Geofabrik extract of Norway and it is missing one of the member ways. Osmium will not build non-complete multipolygons, maybe osmtogeojson is more lenient there.

The missing way is far away from Norway, so I assume the Geofabrik extract process cuts this off.

hyperknot commented 4 years ago

I didn't know that ids are not included by default, so reproducible case was too minimal. The real issue is that the actual GeoJSON file doesn't contain almost half the features from osmium vs. osmtogeojson.

The missing polygon in my case is mainland Norway.

Osmium, 5 polygons: image

Osmtogeojson, 9 polygons: image

Same map view.

What's interesting that in the osmtogeojson export, there are 4 relations (Norway, Sweden, Finland, Russia), which are not included in Osmium, and 5 ways, which are included (the 5 little islands).

joto commented 4 years ago

As I mentioned, once Osmium detects that the multipolygon relation isn't complete, it doesn't even try to build it. That explains why the mainland etc. isn't there. The same goes for the other countries, which are only partly in the data. The islands you are seeing are probably because they are in other relations that are complete, so they are built, but I haven't checked that.

Osmtogeojson had decided to do this differently as is obvious from the image above, it build what it can with the available data which lead to the rendering artefacts you can see. Osmium is more conservative here, insisting on complete data. Both are valid approaches, and I can see advantages and drawbacks in both of them.

hyperknot commented 4 years ago

Thanks for the great explanation. For my use case, I'll have to use Osmtogeojson, even though it's uncomparably much slower and uses way more memory, I have no option.

BTW, I'm surprised about the use case for a GeoJSON export when so many polyongs are not exported. I guess Norway isn't a unique case, it was literally the first file I started testing and ended up with half the polygons missing.

Probably it's quite common to have non-complete relations in OSM. How do you handle them with osmium only?

joto commented 4 years ago

@hyperknot Well, just make sure you have all the data that you need for your purpose. For instance you can start from a planet file and run osmium extract (that's what Geofabrik uses to create its extracts) yourself with the right boundary making sure you get all the data you need. Or use tags-filter on the whole planet, then osmium export, then filter the resulting GeoJSON. Or use Overpass to explicitly query for the data you need. You have to experiment with what works best in your circumstances.

hyperknot commented 4 years ago

@joto actually I'd like to ask your expert opinion on how to achiveve what I'm trying to do.

Wambachers has shut down one of the most populat OSM websites, https://wambachers-osm.website/boundaries/. He has no plan on reopening it, nor can he be contacted on the forum or via OSM email. Forum thread here: https://forum.openstreetmap.org/viewtopic.php?id=68844

I am trying to replicate his great service, in a more minimal, GitHub release based version, where land-clipped GeoJSON extracts can be downloaded per country, per admin_level.

So far my best idea seems to be:

  1. Run tags-filter with r/boundary=administrative and then with r/admin_level=$l on a full planet extract.

  2. Convert each level to GeoJSON (currently using osmtogeojson)

osmium tags-filter planet-latest.osm.pbf r/boundary=administrative -o out/admin.pbf

for l in 2 3 4 5 6 7 8
do
  osmium tags-filter out/admin.pbf r/admin_level=$l -o out/level$l.osm
  osmtogeojson -n --ndjson out/level$l.osm > out/level$l.ndjson
done
  1. Download the latest land polygons ("Large polygons not split") from an other website of yours: https://osmdata.openstreetmap.de/data/land-polygons.html

  2. Convert that one to GeoJSON as well and run mapshaper -clip on each planet extracted level.geojson with the land polygon.

Can you think of a better way to do this functionality?

joto commented 4 years ago

@hyperknot That sounds like a reasonable approach. Just expect some bumps along the road. The OSM data has many quirks. Not sure how subareas will play into this for instance, Osmium doesn't know about them. So you might need one more level of indirection than osmium tags-filter will provide.

There are some other projects that have tried something like this (for instance https://github.com/pnorman/osmborder). If you hit any bumps look around what they have been doing, it might give you some hints.