ropensci / osmdata

R package for downloading OpenStreetMap data
https://docs.ropensci.org/osmdata
316 stars 46 forks source link

Speed up trim_osmdata #178

Open mpadge opened 5 years ago

mpadge commented 5 years ago

Because I finally had call to trim a huge data set (New York City within the official boundary polygon => around 700,000 vertices submitted to sp::point.in.polygon or sf::st_within). The latter especially does not scale well at all, and took something like half an hour. I should just bundle clipper like I already have in moveability and use that instead. That should make it entirely scalable.

FlxPo commented 2 years ago

I'm using osmium extract to operate directly on .osm files, based on a boundary stored in a geojson file, could it work for you ? The performance is quite good.

I'm using it through system calls to the Windows Subsystem for Linux, from R, so it might be tricky to integrate it with in a stand alone R package.

mpadge commented 2 years ago

Yeah, for that kind of operation, osmium is by far the best. On my TODO list is wrapping the src code of that as an R package. I'll get to it one day ... until then, the command line suffices.

Mashin6 commented 2 years ago

Another option is to do the trimming on the server side (also means less downloaded data).

Possibility 1.:

Full query:

[out:json][timeout:250];
area(id:3600175905)->.a;
node[natural=tree](area.a);
out body;



Possibility 2.:

Full query:

rel(id:175905);
map_to_area->.a;
node[natural=tree](area.a);
out body;
mpadge commented 2 years ago

Yes indeed that would be useful @Mashin6, and better in all ways. One way to achieve it might be to introduce yet another trim function that gets piped before the main call, so we'd have a workflow like:

opq(...) |>
    add_osm_feature(...) |>
    overpass_trim(...) |>
    osmdata_<whatever>()

There'd still be a use case for both forms, because area polygons don't always exist, and the current trim_osmdata() function is intended (among other things) to enable data to be trimmed to entirely arbitrary polygons.

If you'd be interested in contributing more directly, please feel free to start a pull request to develop this further. Note also that #252 will require some kind of initial function to determine or validate an OSM area for a given nominatim query - just to check that the string corresponds to a single OSM relation ID. That would then also be used here.

Mashin6 commented 2 years ago

I agree. Having an option to trim locally by a custom polygon is a useful feature. I will start a new issue for the server side trimming.