hrbrmstr / overpass

:information_source: Tools to Work With the OpenStreetMap (OSM) Overpass API in R
https://hrbrmstr.github.io/overpass/
Other
41 stars 6 forks source link

overpass is slow compared with osmdatarr #12

Open Robinlovelace opened 8 years ago

Robinlovelace commented 8 years ago

Reproducible example:

# Requirements:
# devtools::install_github ('osmdatar/osmdatar')
library(osmdatar)
# devtools::install_github("hrbrmstr/overpass")
library(overpass)

b = structure(c(-1.80036221446736, 53.6990006231171, -1.29035539895422,
                53.9458885889733), .Dim = c(2L, 2L), .Dimnames = list(c("x",
                                                                        "y"), c("min", "max")))
# download road network
system.time({ # 8s
  r = get_lines(bbox = b, key = "highway", value = "primary") # slow
})

system.time({ # nearly 70 s
  from_robin <- '[out:xml][timeout:100];
(
  node["highway"="primary"](53.6990006231171,-1.80036221446736,53.9458885889733,-1.29035539895422);
  way["highway"="primary"](53.6990006231171,-1.80036221446736,53.9458885889733,-1.29035539895422);
  relation["highway"="primary"](53.6990006231171,-1.80036221446736,53.9458885889733,-1.29035539895422);
);
out body;
>;
out skel qt;'

  frb <- overpass_query(from_robin)
})

cc @mpadge - how is it so much faster? Even more weirdly

> object.size(frb) / 1000000
3.908632 bytes
> object.size(r) / 1000000
4.145232 bytes
hrbrmstr commented 8 years ago

the C++ code they're using probably has just a bit to do with it :-)

hrbrmstr commented 8 years ago

Is there something this pkg does that theirs doesn't? (I didn't examine it closely). i.e. is overpass redundant?

Robinlovelace commented 8 years ago

Maybe... hence the question. 'They' are very sound and want to do it properly. Any ideas when you'd like to push this to CRAN @mpadge? That will mean more people use that code... Thanks for the fast reply in any case!

Great work both - both code bases are really interesting so hoping there is some cross-polination/mutual benefit to be had.

hrbrmstr commented 8 years ago

I'm so used to using neutral pronouns on StackOverflow (hence the "they" in these SO-ish tex tboxes)

hrbrmstr commented 8 years ago

I could PR the pipeline-y query building code (that I think is complete…need to dbl chk) into the faster pkg.

Robinlovelace commented 8 years ago

+1 I know people who insist on being referred to thus so I understand the safe terminology!

Robinlovelace commented 8 years ago

I think that would be good to put in a PR - building queries and demonstrating them is important. I'll 'boost' that (seems it needs the boost library to work haha) with a couple of use cases of the code you PR.

hrbrmstr commented 8 years ago

oh, that could be a show-stopper. If it's not a header only library and osmdatar is not using BH (it's not from what I see) then it'll be a bear to get working on Windows.

Robinlovelace commented 8 years ago

Potentially an issue. I had to

sudo apt-get install libboost-all-dev

for it to work...

Any ideas @mpadge?

hrbrmstr commented 8 years ago

oh gosh. that's a show-stopper on Windows

mpadge commented 8 years ago

I've not actually gone through your comparisons @Robinlovelace , but the reason osmdatar is fast is because of the combined use of rapidXML and explicit piece-wise construction of S4 sp classes in C++ (must faster than Rcpp-ing an sp one-liner). boost is used only for rapidXML because it's the easiest way I know to provide platform-independent access. After a long-ish break, I plan to get osmdatar in a much better state by end of next week, and will look into more explicit comparisons with @hrbrmstr's overpass, and also alternative ways to provide rapidXML headers without boost.

Robinlovelace commented 8 years ago

Fantastic, thanks for the fast response Mark. Here's something you may also be interested in: https://twitter.com/robinlovelace/status/778465576502996993

Basically sf by @edzer is 100 time+ faster than readOGR for importing spatial data and may eventually replace sp's S4 class system, but not in the near term and you didn't hear that from me!

hrbrmstr commented 8 years ago

Hadley has some examples of using rapidxml vs libxml2. I think he uses it in readxl (IIRC)

On Thu, Oct 6, 2016 at 3:43 PM, Robin notifications@github.com wrote:

Fantastic, thanks for the fast response Mark. Here's something you may also be interested in: https://twitter.com/robinlovelace/status/ 778465576502996993

Basically sf by @edzer https://github.com/edzer is 100 time+ faster than readOGR for importing spatial data and may eventually replace sp's S4 class system, but not in the near term and you didn't hear that from me!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hrbrmstr/overpass/issues/12#issuecomment-252067768, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfHtsKCIktbukn326mQKofkxkR2wmrkks5qxU9dgaJpZM4KQOqY .

Robinlovelace commented 8 years ago

Another benchmark. Using the latest version I get:

> b = structure(c(-1.80036221446736, 53.6990006231171, -1.29035539895422,
+                 53.9458885889733), .Dim = c(2L, 2L), .Dimnames = list(c("x",
+                                                                         "y"), c("min", "max")))
> # download road network
> system.time({ # 8s
+     r = get_lines(bbox = b, key = "highway", value = "primary") # slow
+ })
   user  system elapsed 
  3.220   0.048   4.350 

Several attempts suggest it's now down to an average of 4/5s for that particular benchmark on my system.

After installing the old version, before the C++ refactoring by @virgesmith, it went back to an average of 6/7 s. This installs the version just before this PR https://github.com/osmdatar/osmdatar/pull/5

devtools::install_github("osmdatar/osmdatar", ref = "5b1a9fc77ae081f89065623a32d09c307c02b0e8")

In summary, I've found evidence that your work has speeded up this package @virgesmith, great work!

(Please reproduce and sorry if this isn't the right place to be talking about the performance of this package!)