Closed mpadge closed 6 years ago
I think this could be great - happy to help out however I can. What do you say Bob?
++gd. I can either PR into osmdatar
(can we pick a new name before CRAN? ;-), y'all can add me to osmdatar
or I can add y'all here or we can talk to Scott about starting a new repo in rOpenSci Labs GH
Hey Bob and @Robinlovelace, osmdatar
is pretty much done and ready to merge (with huge thanks to @virgesmith!). Maybe it'll be easiest to start a new repo - ROpenSci would be a great home, as both Robin and I already have packages there. We should agree on who is first going to attempt a merge ... It seems like it might be easier for me to PR into overpass
, as osmdatar
only has 3 main R functions, two of which can be pasted straight into way_utils
and node_utils
(and polygon_utils
added). I'll get onto that next week, unless anyone wants to head off on that beforehand ...?
@Robinlovelace: Wanna repeat a timing comparison for prosperity prior to the merge? You could use this code
cool. whatever's easiest for folks is ++gd for me.
OK - finally managed to do the benchmarks (after realising it has nowt to do with download times, apologies):
method | median time (ms) |
---|---|
boost | 501 |
rapidXML | 274 |
Great to see that the new version is not only faster to process OSM data, it's also faster to install. Full benchmark details:
bbox <- matrix (c (-0.10, 51.50, -0.08, 51.52), nrow=2, ncol=2)
bbox <- paste0 ('(', bbox [2,1], ',', bbox [1,1], ',',bbox [2,2], ',', bbox [1,2], ')')
key <- '[highway]'
query <- paste0 ('(node', key, bbox, ';way', key, bbox, ';rel', key, bbox, ';')
url_base <- 'http://overpass-api.de/api/interpreter?data='
query <- paste0 (url_base, query, ');(._;>;);out;')
dat <- httr::GET (query, timeout=60)
result <- httr::content (dat, "text", encoding='UTF-8')
# Pre rapidxml
system.time(
devtools::install_github("osmdatar/osmdatar", ref = "891d23cf63868b5c603ef7c44aaeaef0cd2debb3")
)
library(osmdatar)
# benchmark:
microbenchmark::microbenchmark (dat <- osmdatar:::rcpp_get_lines (result) , times=100L)
detach("package:osmdatar")
# Post rapidxml
system.time(
devtools::install_github("osmdatar/osmdatar")
)
library(osmdatar)
microbenchmark::microbenchmark (dat <- osmdatar:::rcpp_get_lines (result) , times=100L)
Although that median time seems pretty variable having run the benchmark a couple more times...
Thanks @Robinlovelace - i'll get on to the merge next week. Latest commits have made it even faster!
Great stuff although looking at your C++ code now it looks to me like, to borrow @hrbrmstr's term, hieroglyphics!
Hope to learn C++ in due course if it can make things that fast. Have you seen this @mpadge: https://rcppcore.github.io/RcppParallel/
A term usually reserved for data.table
ops! :-)
I can attest that @mpadge's C++ code is some of the best I've seen, especially in R package-land.
Definitely agree on the impressive speedup as well.
Thanks guys! And @Robinlovelace - C++ is just another language, and if you know one as well as you do with R
, learning another is easy. Just jump on in. You're going to have to at some stage with this package anyway, and between Andrew and myself, you know who to ask. (And the intricacies of the ever-mutating Rcpp
impose greater challenges imho than simply learning C++ - compare the osmdatar Rcpp
bits with Roger and Edzer's C
code in sp
- the world has changed a great deal since that time.)
I've done a few benchmark comparisons with RcppParallel
, but it's quite tricky to implement and most importantly, is often no faster at all than simply breaking up separate Rcpp
functions and parallelising with R parallel
(thereby skipping the headaches of RcppParellel
).
I suggest that we start a new repo in osmdatar
which merges both overpass
and osmdatar
so the full change logs of both are simultaneously merged. I'll make sure the initial merge repo contains all unique files so both osmdatar
and overpass
can then do subtree merges to reflect this repo if desired. The idea then will be to move this repo to ROpenSci, right?
And so the all-important question: What's it called? Bob gave osmdatar
a minus, which is absolutely fine. My counter is a minus to overpass
simply because I envision the package being useful/used by anybody simply wanting OSM data within R and to be as overpass
-agnostic as possible (query construction notwithstanding). I also wanted a name that would be returned from searches like 'osm data r'. Suggestions Bob? @Robinlovelace?
overlord
(that's not going to help googlers tho, but it's likely to be found by googlers if we ensure we have OSM
and data
in the DESCRIPTION
)
Here's all the "open", "street", "osm" pkgs in CRAN to-date:
CosmoPhotoz
OSMscale
cosmoFns
cosmosR
iosmooth
osmar
osmplotr
rosm
OpenStreetMap
OpenImageR
OpenMPController
OpenMx
OpenRepGrid
OpenStreetMap
R2OpenBUGS
ROpenFIGI
ROpenWeatherMap
openNLP
openNLPdata
openVA
openadds
openair
opencage
opencpu
openintro
openssl
opentraj
openxlsx
ropenaq
ropensecretsap
We shld also strive to get the combined pkg on the OSM wiki: http://wiki.openstreetmap.org/wiki/OSM_Scientific_Tools
Sounds like great progress towards a merge. More than happy for it to be hosted on ROpenSci also. Haha the age old question of what to call things strikes again! overlord sounds Tolkienesque and epic, both pluses.
But I'd still give that minuses compared with osmdatar and overpass as it tells the user nowt about what it's going to do from the name. Of those meaningful names I like osmdatar more than overpass, simply because many people have heard of OSM while relatively few have heard of overpass. Any other contenders? How's osmr, osmR or even osmdatar (oh that's what it's currently called)? Like @edzer (who developed sp and then sfr which was clearly too verbose a name so became sf) I like pithy names but happy with what it currently is.
I try to avoid "r" in R packages to point to R, in texts you'd use "the osmdatar R package", which has the obsolete r. In scripts, the R context is clear anyway. OSM is more than it's data, so "osm" or "rosm" or "osmr" are not very clear on what the package does with OSM (unless everything). If it only imports OSM data, I'd suggest "osmimport" or "importosm"; if it reads and writes data from and to OSM, I'd suggest "osmdata".
Thanks @edzer - osmdata sounds good to me. importosm is pretty clear - like that too. Suspect it will eventually have functions for subsetting and doing other things with the data so. My current preference is osmdata/r depending if people prefer it with or without the r. I can see that if every package had that at the end it would make R naming conventions, which are already anarchistic (@rasmusab, 2012), even ridiculousr!
Reference: Baath, R., 2012. The state of naming conventions in R. The R Journal 4, 74–75. https://journal.r-project.org/archive/2012-2/RJournal_2012-2_Baaaath.pdf
But @mpadge and @hrbrmstr should probably have 1st dibs as they developed most of the code-base : )
osmdata
is definitely descriptive and accurate, so +1 :thumbsup:
And, it saves overlord
for some R package infused with dark magic that I
think needs to be built, now ;-)
On Sun, Oct 16, 2016 at 9:16 AM, Robin notifications@github.com wrote:
But @mpadge https://github.com/mpadge and @hrbrmstr https://github.com/hrbrmstr should probably have 1st dibs as they developed most of the code-base : )
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hrbrmstr/overpass/issues/13#issuecomment-254046261, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfHtk5LXlGMOOwmPV_VqXZTQLnB2ax2ks5q0iOpgaJpZM4KSZAt .
Thanks @hrbrmstr and @Robinlovelace, I'll get straight on to creating a merge at osmdatar/overlord
then shall I? ... Oh, okay, it's at osmdatar/osmdata
, but only so that something infinitely more interesting can be made of overlord.
import
names get a non-+ from me because non-specialist-type folk aren't first going to think in terms of import
-ing, they're just going to want to know how to get osm
data
into R
. I'll happily wager a case of beer that web searches for osm+data+R
outnumber those for osm+import+R
by at least an order of magnitude. Aside from which osm-import
sounds like it imports OSM wholesale, which it doesn't at all, it merely imports osmdata
.
I'll presume we're all okay with an MIT license, as this already applies to both overpass
and rapidXML
, and is the preferred form for ROpenSci. Just give word if not.
(And just for info, the admittedly mighty ugly R
at the end was because I'm not convinced that there aren't algorithmic components of a particularly prominent search engine that, when confronted with non-words, parse down to identifiable lexical units and thus emerge from title alone to the R
-context. Sites for my equally unattractively named osmplotr
on both git and cran actually only deliver the R
-context a few lines down, yet return it way up high in searches for osm+plot+r
. But no fear, I'm already sold on osmdata
)
Good stuff. Re licenses I have a mild preference for a more restrictive licences like the GPL-2 or GPL-3 that R itself is licensed under, even though I put stplanr out under the MIT license. Happy to explain why in more detail but no time now. What do other people think? List of options: https://www.r-project.org/Licenses/
I'll defer to whatev y'all decide, but I've seen enough for-profit startups and rogue consultants actually seriously profit from R open source projects without contributing back that I've been using AGPL lately to at least try to force SaaS folks to have a convo with me before blatantly disregarding the community. I doubt folks will abuse this pkg, tho (mebbe I'm wrong). I'd vote in it be in the GPL-ish family over MIT, tho.
On Tue, Oct 18, 2016 at 8:46 AM, Robin notifications@github.com wrote:
Good stuff. Re licenses I have a mild preference for a more restrictive licences like the GPL-2 or GPL-3, even though I put stplanr out under the MIT license. Happy to explain why in more detail but no time now. What do other people think? List of options: https://www.r-project.org/ Licenses/
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hrbrmstr/overpass/issues/13#issuecomment-254496066, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfHtrJaBCMjkzVTg0R6wFASGEnKqrdeks5q1L-VgaJpZM4KSZAt .
+1 what Bob says.
GPL-3 it shall be then. The merge should be in much better shape very soon, but most importantly, it retains the entire changelogs of both overpass and osmdatar. (I was impressed to discover that this can be done!)
Awesome.
In summary: awesome to reflect back on these discussions 1.5 years later and it's amazing to see how things have come on. I'd like to express very belated gratitude for getting this thing rolling and the support for its outgrowth into the awesome osmdata now hosted on rOpenSci. This is one of my favourite threads on GitHub but think we can comprehensively say: mission accomplished so far!
So thanks to the work of @Robinlovelace, we both know there is a great deal of compatibility between
overpass
andosmdatar
. So much in fact that i'd suggest the two be merged. The good news as far as I can discern it is:overpass
seems to be primarily focussed on enabling a very precise tool for constructing and passingoverpass
queries withinR
, while ...osmdatar
has the crudest of all possible ways of constructing queries, and focusses instead on the fastest possible processing of OSM objects within anR
call.The two thus seem to be to be entirely complementary, and merging could and would only help both enormously (from your side as shown by the speed comparisons of @Robinlovelace).
osmdatar
effectively justRcpp
-ifies all of your internaldplyr
-type manipulation of theoverpass XML
data. I did have an explicit timing comparison with your approach here. Although both our packages have developed since then, the general pattern remains.The division of labour of a potential merge would seem pretty obvious: The interface end would remain almost unchanged from
overpass
, while the processing currently done inway_utils
andnode_utils
could simply be replaced by theosmdatar
code. (Also note that polygon processing inosmdatar
is almost complete.)If you're interested, we could at least start by discussing the direction of the merge ... for which my only concern from an
osmdatar
perspective would be that I'm attempting to merely useoverpass
to access data, with the package primarily serving those data in a usable format. (This will ultimately also include returning street networks inigraph
form for routing queries.) Thus the nameoverpass
doesn't really reflect the breadth of what I envision the package achieving. But that's a very minor concern, and other than that I look forward to your insights