hrbrmstr / overpass

:information_source: Tools to Work With the OpenStreetMap (OSM) Overpass API in R
https://hrbrmstr.github.io/overpass/
Other
41 stars 6 forks source link

Merge `osmdatar` with `overpass` #13

Closed mpadge closed 6 years ago

mpadge commented 7 years ago

So thanks to the work of @Robinlovelace, we both know there is a great deal of compatibility between overpass and osmdatar. So much in fact that i'd suggest the two be merged. The good news as far as I can discern it is:

  1. overpass seems to be primarily focussed on enabling a very precise tool for constructing and passing overpass queries within R, while ...
  2. osmdatar has the crudest of all possible ways of constructing queries, and focusses instead on the fastest possible processing of OSM objects within an R call.

The two thus seem to be to be entirely complementary, and merging could and would only help both enormously (from your side as shown by the speed comparisons of @Robinlovelace). osmdatar effectively just Rcpp-ifies all of your internal dplyr-type manipulation of the overpass XML data. I did have an explicit timing comparison with your approach here. Although both our packages have developed since then, the general pattern remains.

The division of labour of a potential merge would seem pretty obvious: The interface end would remain almost unchanged from overpass, while the processing currently done in way_utils and node_utils could simply be replaced by the osmdatar code. (Also note that polygon processing in osmdatar is almost complete.)

If you're interested, we could at least start by discussing the direction of the merge ... for which my only concern from an osmdatar perspective would be that I'm attempting to merely use overpass to access data, with the package primarily serving those data in a usable format. (This will ultimately also include returning street networks in igraph form for routing queries.) Thus the name overpass doesn't really reflect the breadth of what I envision the package achieving. But that's a very minor concern, and other than that I look forward to your insights

Robinlovelace commented 7 years ago

I think this could be great - happy to help out however I can. What do you say Bob?

hrbrmstr commented 7 years ago

++gd. I can either PR into osmdatar (can we pick a new name before CRAN? ;-), y'all can add me to osmdatar or I can add y'all here or we can talk to Scott about starting a new repo in rOpenSci Labs GH

mpadge commented 7 years ago

Hey Bob and @Robinlovelace, osmdatar is pretty much done and ready to merge (with huge thanks to @virgesmith!). Maybe it'll be easiest to start a new repo - ROpenSci would be a great home, as both Robin and I already have packages there. We should agree on who is first going to attempt a merge ... It seems like it might be easier for me to PR into overpass, as osmdatar only has 3 main R functions, two of which can be pasted straight into way_utils and node_utils (and polygon_utils added). I'll get onto that next week, unless anyone wants to head off on that beforehand ...?

@Robinlovelace: Wanna repeat a timing comparison for prosperity prior to the merge? You could use this code

hrbrmstr commented 7 years ago

cool. whatever's easiest for folks is ++gd for me.

Robinlovelace commented 7 years ago

OK - finally managed to do the benchmarks (after realising it has nowt to do with download times, apologies):

method median time (ms)
boost 501
rapidXML 274

Great to see that the new version is not only faster to process OSM data, it's also faster to install. Full benchmark details:

bbox <- matrix (c (-0.10, 51.50, -0.08, 51.52), nrow=2, ncol=2)
bbox <- paste0 ('(', bbox [2,1], ',', bbox [1,1], ',',bbox [2,2], ',', bbox [1,2], ')')
key <- '[highway]'
query <- paste0 ('(node', key, bbox, ';way', key, bbox, ';rel', key, bbox, ';')
url_base <- 'http://overpass-api.de/api/interpreter?data='
query <- paste0 (url_base, query, ');(._;>;);out;')
dat <- httr::GET (query, timeout=60)
result <- httr::content (dat, "text", encoding='UTF-8')

# Pre rapidxml
system.time(
  devtools::install_github("osmdatar/osmdatar", ref = "891d23cf63868b5c603ef7c44aaeaef0cd2debb3")
)

library(osmdatar)
# benchmark:
microbenchmark::microbenchmark (dat <- osmdatar:::rcpp_get_lines (result) , times=100L)
detach("package:osmdatar")

# Post rapidxml
system.time(
  devtools::install_github("osmdatar/osmdatar")
)
library(osmdatar)
microbenchmark::microbenchmark (dat <- osmdatar:::rcpp_get_lines (result) , times=100L)
Robinlovelace commented 7 years ago

Although that median time seems pretty variable having run the benchmark a couple more times...

mpadge commented 7 years ago

Thanks @Robinlovelace - i'll get on to the merge next week. Latest commits have made it even faster!

Robinlovelace commented 7 years ago

Great stuff although looking at your C++ code now it looks to me like, to borrow @hrbrmstr's term, hieroglyphics!

Hope to learn C++ in due course if it can make things that fast. Have you seen this @mpadge: https://rcppcore.github.io/RcppParallel/

hrbrmstr commented 7 years ago

A term usually reserved for data.table ops! :-)

I can attest that @mpadge's C++ code is some of the best I've seen, especially in R package-land.

Definitely agree on the impressive speedup as well.

mpadge commented 7 years ago

Thanks guys! And @Robinlovelace - C++ is just another language, and if you know one as well as you do with R, learning another is easy. Just jump on in. You're going to have to at some stage with this package anyway, and between Andrew and myself, you know who to ask. (And the intricacies of the ever-mutating Rcpp impose greater challenges imho than simply learning C++ - compare the osmdatar Rcpp bits with Roger and Edzer's C code in sp - the world has changed a great deal since that time.)

I've done a few benchmark comparisons with RcppParallel, but it's quite tricky to implement and most importantly, is often no faster at all than simply breaking up separate Rcpp functions and parallelising with R parallel (thereby skipping the headaches of RcppParellel).

mpadge commented 7 years ago

I suggest that we start a new repo in osmdatar which merges both overpass and osmdatar so the full change logs of both are simultaneously merged. I'll make sure the initial merge repo contains all unique files so both osmdatar and overpass can then do subtree merges to reflect this repo if desired. The idea then will be to move this repo to ROpenSci, right?

And so the all-important question: What's it called? Bob gave osmdatar a minus, which is absolutely fine. My counter is a minus to overpass simply because I envision the package being useful/used by anybody simply wanting OSM data within R and to be as overpass-agnostic as possible (query construction notwithstanding). I also wanted a name that would be returned from searches like 'osm data r'. Suggestions Bob? @Robinlovelace?

hrbrmstr commented 7 years ago

overlord (that's not going to help googlers tho, but it's likely to be found by googlers if we ensure we have OSM and data in the DESCRIPTION)

Here's all the "open", "street", "osm" pkgs in CRAN to-date:

We shld also strive to get the combined pkg on the OSM wiki: http://wiki.openstreetmap.org/wiki/OSM_Scientific_Tools

Robinlovelace commented 7 years ago

Sounds like great progress towards a merge. More than happy for it to be hosted on ROpenSci also. Haha the age old question of what to call things strikes again! overlord sounds Tolkienesque and epic, both pluses.

But I'd still give that minuses compared with osmdatar and overpass as it tells the user nowt about what it's going to do from the name. Of those meaningful names I like osmdatar more than overpass, simply because many people have heard of OSM while relatively few have heard of overpass. Any other contenders? How's osmr, osmR or even osmdatar (oh that's what it's currently called)? Like @edzer (who developed sp and then sfr which was clearly too verbose a name so became sf) I like pithy names but happy with what it currently is.

edzer commented 7 years ago

I try to avoid "r" in R packages to point to R, in texts you'd use "the osmdatar R package", which has the obsolete r. In scripts, the R context is clear anyway. OSM is more than it's data, so "osm" or "rosm" or "osmr" are not very clear on what the package does with OSM (unless everything). If it only imports OSM data, I'd suggest "osmimport" or "importosm"; if it reads and writes data from and to OSM, I'd suggest "osmdata".

Robinlovelace commented 7 years ago

Thanks @edzer - osmdata sounds good to me. importosm is pretty clear - like that too. Suspect it will eventually have functions for subsetting and doing other things with the data so. My current preference is osmdata/r depending if people prefer it with or without the r. I can see that if every package had that at the end it would make R naming conventions, which are already anarchistic (@rasmusab, 2012), even ridiculousr!

Reference: Baath, R., 2012. The state of naming conventions in R. The R Journal 4, 74–75. https://journal.r-project.org/archive/2012-2/RJournal_2012-2_Baaaath.pdf

Robinlovelace commented 7 years ago

But @mpadge and @hrbrmstr should probably have 1st dibs as they developed most of the code-base : )

hrbrmstr commented 7 years ago

osmdata is definitely descriptive and accurate, so +1 :thumbsup:

And, it saves overlord for some R package infused with dark magic that I think needs to be built, now ;-)

On Sun, Oct 16, 2016 at 9:16 AM, Robin notifications@github.com wrote:

But @mpadge https://github.com/mpadge and @hrbrmstr https://github.com/hrbrmstr should probably have 1st dibs as they developed most of the code-base : )

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hrbrmstr/overpass/issues/13#issuecomment-254046261, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfHtk5LXlGMOOwmPV_VqXZTQLnB2ax2ks5q0iOpgaJpZM4KSZAt .

mpadge commented 7 years ago

Thanks @hrbrmstr and @Robinlovelace, I'll get straight on to creating a merge at osmdatar/overlord then shall I? ... Oh, okay, it's at osmdatar/osmdata, but only so that something infinitely more interesting can be made of overlord.

import names get a non-+ from me because non-specialist-type folk aren't first going to think in terms of import-ing, they're just going to want to know how to get osm data into R. I'll happily wager a case of beer that web searches for osm+data+R outnumber those for osm+import+R by at least an order of magnitude. Aside from which osm-import sounds like it imports OSM wholesale, which it doesn't at all, it merely imports osmdata.

I'll presume we're all okay with an MIT license, as this already applies to both overpass and rapidXML, and is the preferred form for ROpenSci. Just give word if not.

(And just for info, the admittedly mighty ugly R at the end was because I'm not convinced that there aren't algorithmic components of a particularly prominent search engine that, when confronted with non-words, parse down to identifiable lexical units and thus emerge from title alone to the R-context. Sites for my equally unattractively named osmplotr on both git and cran actually only deliver the R-context a few lines down, yet return it way up high in searches for osm+plot+r. But no fear, I'm already sold on osmdata)

Robinlovelace commented 7 years ago

Good stuff. Re licenses I have a mild preference for a more restrictive licences like the GPL-2 or GPL-3 that R itself is licensed under, even though I put stplanr out under the MIT license. Happy to explain why in more detail but no time now. What do other people think? List of options: https://www.r-project.org/Licenses/

hrbrmstr commented 7 years ago

I'll defer to whatev y'all decide, but I've seen enough for-profit startups and rogue consultants actually seriously profit from R open source projects without contributing back that I've been using AGPL lately to at least try to force SaaS folks to have a convo with me before blatantly disregarding the community. I doubt folks will abuse this pkg, tho (mebbe I'm wrong). I'd vote in it be in the GPL-ish family over MIT, tho.

On Tue, Oct 18, 2016 at 8:46 AM, Robin notifications@github.com wrote:

Good stuff. Re licenses I have a mild preference for a more restrictive licences like the GPL-2 or GPL-3, even though I put stplanr out under the MIT license. Happy to explain why in more detail but no time now. What do other people think? List of options: https://www.r-project.org/ Licenses/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hrbrmstr/overpass/issues/13#issuecomment-254496066, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfHtrJaBCMjkzVTg0R6wFASGEnKqrdeks5q1L-VgaJpZM4KSZAt .

Robinlovelace commented 7 years ago

+1 what Bob says.

mpadge commented 7 years ago

GPL-3 it shall be then. The merge should be in much better shape very soon, but most importantly, it retains the entire changelogs of both overpass and osmdatar. (I was impressed to discover that this can be done!)

Robinlovelace commented 7 years ago

Awesome.

Robinlovelace commented 6 years ago

In summary: awesome to reflect back on these discussions 1.5 years later and it's amazing to see how things have come on. I'd like to express very belated gratitude for getting this thing rolling and the support for its outgrowth into the awesome osmdata now hosted on rOpenSci. This is one of my favourite threads on GitHub but think we can comprehensively say: mission accomplished so far!