openstreetmap / iD

🆔 The easy-to-use OpenStreetMap editor in JavaScript.
https://www.openstreetmap.org/edit?editor=id
ISC License
3.37k stars 1.21k forks source link

Local country code lookup #6941

Closed quincylvania closed 5 years ago

quincylvania commented 5 years ago

iD is and will be relying more and more on location-aware behavior. See #6513, #6479, #6712, #6836, and #6713, for example.

Right now we call out to nominatim every time we need the country code for a pair of coordinates. It'd be much more efficient and reliable to do this synchronously by querying a data file bundled with iD.

The trade-off here is that the size of the file would not be trivial. But since iD doesn't require high-precision results, we could generalize the data considerably

See previous discussion on this topic in the OSMUS Slack.

1ec5 commented 5 years ago

In addition to size, we should also keep an eye on runtime performance, considering that a single changeset can straddle a national border or jump around to different parts of the world. For example, which-polygon is very efficient for point-in-polygon lookups, but its memory usage is very sensitive to the complexity of the country polygons, and so might the time it takes to do the lookup.

The discussion in Slack points to Natural Earth as a possible source for the country geometries, but I don’t think we should use it as-is. For the features listed above, iD needs relatively high resolution along land borders but very low resolution along coastlines. For example, it’d be a good idea for the local lookup to unify Canada into a single polygon that includes all its islands. However, all of Detroit needs to be on the American side of the border and all of Mexicali on the Mexican side, with a tolerance of tens of meters perhaps, but not kilometers. A simple Douglas–Peucker simplification of the entire shapefile would result in the wrong address format and wrong language being preferred in neighborhoods on either side of the border.

Geofabrik’s data extract polygons are a good example of generalizing coastlines while retaining detail in land boundaries.

quincylvania commented 5 years ago

@1ec5 I totally agree. Thankfully file size and point-in-polygon performance correlate, so we can optimize for both. The raw Natural Earth dataset is much too detailed for this use case, even at 110m resolution. Coastline generalization should be a primary strategy, where islands like Iceland can be represented as simple rectangles or even triangles. For our purposes we don't need to know if a point is on land or not.

I was also thinking this would make for a good external module that other apps could also use.

bhousel commented 5 years ago

This is a great idea, and definitely something that's been on our radar for a while, and I'd use in a bunch of projects.

The closest thing we have right now is in the osm-community-index, which includes a bunch of country-level polygons, but also a bunch of other smaller ones. You can browse the osm-community-index data here on this nice map that @mikelmaron made: https://mikelmaron.github.io/map-demos/osm-community-index/

The polygon data by itself comes out to 238k minified. We are already using which-polygon in iD to index this data and also the editor-layer-index polygons. This approach is very fast because it precalculates bounding boxes and stores them in an rbush, so its only really doing the point-in-polygon tests for the polygons with bounds that actually intersect the point.

There are obviously some seams and places where we could improve a bunch on this. Part of the issue is that each geojson has been added independently by different contributors. Using an editor like iD but that's specifically built for generating a boundary mesh would be nice because then we could snap points together.

A handful of countries make the index much larger because of their complex borders. This is not intuitive (yes, Russia and France both have about equally complicated borders, Canada and US are less than half as complex). I tend to simplify a lot in sparsely populated areas. Not all of these have been hand-edited, so there is a lot of room for improvement.

There is also a stats command so I can keep track of the polygon sizes:

Screenshot 2019-10-16 09 45 34

So.. My approach to doing this right would be:

  1. make an iD fork that is specifically for editing GeoJSON.
  2. use that to edit and refine the country mesh.

I'm working slowly towards laying the foundation that would let us do 1.

don-vip commented 5 years ago

You can also reuse the JOSM boundaries file: https://josm.openstreetmap.de/export/HEAD/josm/trunk/data/boundaries.osm (1.8Mb in .osm format, 5.4Mb in geojson format). It contains all countries, plus subdivisions for US, Canada, India and China: image See https://josm.openstreetmap.de/log/josm/trunk/data/boundaries.osm for the list of fixed issues since I introduced it 3 years ago.

quincylvania commented 5 years ago

@don-vip Thanks so much for the link! That's a great help, I think we'll be able to use it as a starting point.

🏎💨

quincylvania commented 5 years ago

Update: I've been working on this for the past week or so. Check out the package repo: https://github.com/ideditor/country-coder