r-spatial / discuss

a discussion repository: raise issues, or contribute!
54 stars 12 forks source link

New package to normalize spatial data for web plotting. #15

Open bhaskarvk opened 7 years ago

bhaskarvk commented 7 years ago

This is a bit of a future planning, but here is the main idea. Currently there is code in the leaflet package that extracts data from sp and sf objects and converts it into a dataframe that is then passed to the Javascript side (by converting it into a JSON). This code is fairly generic and not really dependent on anything leaflet specific. It makes a lot of sense to take out this code and make it a package of its own. That way we can build other web plotting R packages to wrap say d3.geo or mapboxGL or cesium and reuse a major chunk of the code that takes data from spatial objects and passes it to Javascript.

I have some discussions about this with @jcheng5 and agrees that this is a good idea. There are some questions I have for the r-spatial community.

a) Do you think this is a good idea ? b) If so then do you think it makes sense for this proposed package to live in r-spatial repo ? c) If b) is 'yes' what sort of licensing and copyright arrangement we need in place between RStudio and r-spatial ?

cc @tim-salabim @edzer

tim-salabim commented 7 years ago

@bhaskarvk

a) I think this is a great idea b) I don't see why this is not suited for r-spatial c) I am no expert on licensing and all that is related, thus I don't really feel qualified to provide a definite answer on this one. I would love to get some more input here.

Two comments/questions from my side (that I can think of right now):

bhaskarvk commented 7 years ago

@tim-salabim geojsonio is for reading/writing geo/topo JSONs from the file-system. What I am proposing is a common package that will take any spatial R object (sp, sf, geo/topo JSONs either as lists, char strings or R objects) and making them available to a htmlwidget in a consistent manner. That way we can easily make new web GIS plotting pkgs that wrap mapboxGL / Cesium / OpenLayers etc.

It may well be that this package will rely on sp/sf/geojson/geojsonio packages to read the data but what differentiates it is the consistent manner in which it makes this spatial data available to the widget side.

So then you can have code that can look like


leaflet() %>% 
  addPolygons(some<sp|sf>polygon-data,...)

# OR

mapboxGL() %>%
  addPolygons(some<sp|sf>polygon-data,...)

# OR

openLayers() %>%
  addPolygons(some<sp|sf>polygon-data,...)

w/o having to duplicate the code which reads the spatial objects in individual leaflet/mapboxGL/Cesium packages.

tim-salabim commented 7 years ago

relevant https://github.com/rstudio/leaflet/issues/452

edzer commented 7 years ago

For the part (sf -> data.frame, sp -> data.frame), I think it makes more sense to have this as part of the sf and sp APIs, i.e. inside the packages.

For instance, https://github.com/rstudio/leaflet/issues/452 does not happen when leaflet uses st_coordinates on the sfc object, instead of calling do.call(rbind, sfc) which wrongly assumes sfc is not an empty list.

bhaskarvk commented 7 years ago

Yes that's an acceptable solution as well. I just think it belongs outside of leaflet.

edzer commented 7 years ago

Fair enough. Which functions in leaflet does this concern?

mdsumner commented 7 years ago

I've been working on this in spbabel and in a superseded form of that in https://github.com/mdsumner/sc

The discussion and rationale there is my best overview of the landscape, but I've learnt quite a lot more since those were written.

The form must be relational, composed of multiple data frames - that's the only way to store all the types that are needed, and it's the only way to store topology at all (you can't do that with nesting, even if you nest indexes you still need a common pool table for those indexes to refer to).

It's desperately needed that we have a common agreed form for these data in R, and I think specific packages should all contain decompositions to a generic form for their specific types. I've learnt enough in those and related projects for progressing my work but I'm very happy to pursue this in a more general form that transcends all and any specific implementations currently in use. The OSM work by rOpenSci has similar challenges, and osmdata in particular is an important use-case.

@mpadge this is part of the general problem we've been talking about :)

Finally, I'm absolutely delighted to hear this is seen as important and I'm extremely happy to help in any way I can. This is essential for the R community to move forward on, and I look forward to seeing how my explorations will fit into this, thanks @bhaskarvk !

bhaskarvk commented 7 years ago

@edzer All the code in leaflet's R/normalize*.R files is what I was thinking.

SymbolixAU commented 7 years ago

I might be going off on a tangent to the theme of this discussion, but, what's r-spatial/the community's thoughts on using encoded polylines to represent geometries?

Whenever I plot a map in googleway I always encode my spatial objects first as it reduces the size of the object being plotted (and the encoded polylines are natively supported in Google Map's API).

I've been playing about with a spatialdatatable package to do the encoding. I don't know how far I'm going to take this package, but if there's appetite to include it in r-spatial then I'll carry on.

edzer commented 7 years ago

I see this similar to s2 cells and geohash; dedicated optimizations where you can afford some rounding and bandwidth is an issue. This one aims at communicating with the google maps stack.

Your package does this, as well as the integration with data.table. Does it make sense to somehow separate that?

If you believe it will attract a larger user community, we could move the package here.

SymbolixAU commented 7 years ago

I think separating the encoding/normalising is probably a good idea and would be a better fit for this 'new package' (whatever it turns out to be). And I think there will definitely be a better way of doing the encoding from sf - to polyline than my nested lapply's. I also started to look into boost and CGAL, but haven't progressed with it.

The reason I started writing spatialdatatable was to speed up the geosphere calculations, and also make them naturally usable inside data.table[ ] syntax.

SymbolixAU commented 6 years ago

I've made a start by creating googlePolylines to handle the encoding and decoding of (primarily) sf objects into encoded polylines. As mentioned, the encoded lines reduce precision, but can speed-up plotting

I've seen plugins for leaflet to use these polylines too so there may be some opportunity for integration.

SymbolixAU commented 6 years ago

Thanks for the reminder @tim-salabim !

Given my recent updates to mapdeck I think I've got a solid base of code to make this 'normalised data' package, so I'm happy to get this going.

anyone got a good suggestion for a package name?

SymbolixAU commented 6 years ago

This is my idea for the package

mdsumner commented 6 years ago

@SymbolixAU can I suggest you take a look at silicate - the binary branch - there's two key functions BINARY and SC.

(object is feature in sf terms, but more general - we can have mesh types and other non SF forms)

The first is not topological (no vertex de-dupe) and cannot survive vertex subsetting without remapping the indexes. The second is topological (unique in x/y by default), with unique IDs for object, edge and vertex - so it can be arbitrarily ordered and passed through other systems.

This has festered a bit, and my anglr package needs an update with the new SC/BINARY structure, but I'm hoping we can find common ground here. These forms admit conversion to other formats pretty easily, and there are verbs for extracting the entities sc_coord, sc_path, sc_vertex etc. so most of the format-specific details can go into methods for those.

SymbolixAU commented 6 years ago

yes. And I really want to start working with your structures to see what they are all about. We definitely need to get it all integrated.

mdsumner commented 6 years ago

all right, sorry for the dead horse flogging - spatialwidget doesn't look like what I thought you were talking about - trying to get a bearing on how you see things. :+1:

SymbolixAU commented 6 years ago

I'm going to add some concrete R examples to the spatialwidget package to hopefully make the design / rationale clear :)

SymbolixAU commented 6 years ago

Going to commit something this evening, but, for starters, this is what I'm aiming for.

You pass it an sf object, tell it which columns of sf are the colours/opacities/whatever (or you can specify specific values), and it returns a list with 2 JSON objects. These JSON objects can then be parsed by an htmlwidget

spatial_line(mapdeck::roads[1:5, ], stroke_colour = "FQID", stroke_opacity = 3, stroke_width = 3)

$data
[1] "[{\"type\":\"Feature\",\"properties\":{\"stroke_colour\":\"#FDE72503\",\"stroke_width\":3.0},\"geometry\":{\"geometry\":{\"type\":\"LineString\",\"coordinates\":[[145.014291,-37.830458],[145.014345,-37.830574],[145.01449,-37.830703],[145.01599,-37.831484],[145.016479,-37.831699],[145.016813,-37.83175],[145.01712,-37.831742],[145.0175,-37.831667],[145.017843,-37.831559],[145.018349,-37.83138],[145.018603,-37.83133],[145.018901,-37.831301],[145.019136,-37.831301],[145.01943,-37.831333],[145.019733,-37.831377],[145.020195,-37.831462],[145.020546,-37.831544],[145.020641,-37.83159],[145.020748,-37.83159],[145.020993,-37.831664]]}}},{\"type\":\"Feature\",\"properties\":{\"stroke_colour\":\"#44015403\",\"stroke_width\":3.0},\"geometry\":{\"geometry\":{\"type\":\"LineString\",\"coordinates\":[[145.015016,-37.830832],[145.015561,-37.831125],[145.016285,-37.831463],[145.016368,-37.8315],[145.016499,-37.831547],[145.016588,-37.831572],[145.01668,-37.831593],[145.01675,-37.831604],[145.016892,-37.83162],[145.016963,-37.831623],[145.017059,-37.831623],[145.017154,-37.831617],[145.017295,-37.831599],[145.017388,-37.831581],[145.017523,-37.831544],[145.018165,-37.831324],[145.018339,-37.831275],[145.018482,-37.831245],[145.018627,-37.831223],[145.01881,-37.831206],[145.018958,-37.831202],[145.019142,-37.831209],[145.019325,-37.831227],[145.019505,-37.831259],[145.020901,-37.831554],[145.020956,-37.83157]]}}},{\"type\":\"Feature\",\"properties\":{\"stroke_colour\":\"#FDE72503\",\"stroke_width\":3.0},\"geometry\":{\"geometry\":{\"type\":\"LineString\",\"coordinates\":[[145.020116,-37.830563],[145.019885,-37.830572],[145.019502,-37.83069],[145.01935,-37.8307],[145.019104,-37.830655],[145.01582199999999,-37.829909],[145.013658,-37.829467],[145.013556,-37.82946],[145.013446,-37.829437],[145.013344,-37.829403],[145.013174,-37.829359],[145.01303,-37.829346],[145.012949,-37.829349],[145.012915,-37.8294],[145.01289,-37.829551],[145.012699,-37.82969]]}}},{\"type\":\"Feature\",\"properties\":{\"stroke_colour\":\"#23898D03\",\"stroke_width\":3.0},\"geometry\":{\"geometry\":{\"type\":\"LineString\",\"coordinates\":[[145.013367,-37.82957],[145.013578,-37.82958],[145.014053,-37.829673],[145.014522,-37.829757],[145.015338,-37.829902],[145.016323,-37.830123],[145.017672,-37.830471],[145.019195,-37.830872]]}}},{\"type\":\"Feature\",\"properties\":{\"stroke_colour\":\"#20928C03\",\"stroke_width\":3.0},\"geometry\":{\"geometry\":{\"type\":\"LineString\",\"coordinates\":[[145.019266,-37.831062],[145.014738,-37.830149],[145.014392,-37.830096],[145.014048,-37.830059]]}}}]"
attr(,"class")
[1] "json"

$legend
[1] "{\"stroke_colour\":{\"colour\":[\"#44015403\",\"#3B528B03\",\"#21908C03\",\"#5DC96303\",\"#FDE72503\"],\"variable\":[\"1347.00\",\"2389.25\",\"3431.50\",\"4473.75\",\"5516.00\"],\"colourType\":[\"stroke_colour\"],\"type\":[\"gradient\"],\"title\":[\"FQID\"],\"css\":[\"\"]}}"
attr(,"class")
[1] "json"

where it can render the ~18k rows in milliseconds

nrow(mapdeck::roads)
# [1] 18286

system.time({
  lst <- spatial_line(mapdeck::roads, stroke_colour = "FQID", stroke_opacity = 3, stroke_width = 3)
})
# user  system elapsed 
# 0.084   0.010   0.100 
mdsumner commented 6 years ago

Do you mean for the aes()-like mapping for that? I assume the conversion is straightforward (geojsonsf c++ ...).

I've toyed with aes(), though now it's probably purely group_by and select with named special attributes is the way to go with rlang? So I go

mapdeck::roads[1:5, ] %>% spatial_line(geometry = geometry,, stroke_colour = FQID, stroke_opacity = 3, stroke_width = 3)

and under the hood what happens is like

mapdeck::roads[1:5, ] %>% transmute(geometry = geometry, stroke_colour = FQID, stroke_width = 3)

But, lazily and without actually creating a new sf object - using rlang. Is that on the right track?

SymbolixAU commented 6 years ago

I think under the hood it's along those lines, yes. With a little bit extra wrangling to create colours from the variables (and also a summary palette for a legend), and finally the geojson step.

The idea is the output of spatial_line() feeds directly to javascript through the various invoke_method() calls in:

So internally, each of those addPolyine(), add_polyline(), add_path() will have a function body similar to

add_new_polyline <- function(sf, ... ) {
  ## a bit of internal stuff for each implementation
  js <- spatial_line(...)
  invoke_method( ..., js , ... )
}

for comparison

sf <- mapdeck::roads

library(microbenchmark)

microbenchmark(
  leaflet = {
    leaflet::leaflet() %>%
      leaflet::addPolylines(data = sf)
  },
  googleway = {
    googleway::google_map(key = "abc") %>%
      googleway::add_polylines(data = sf)
  }, 
  spatialwidget = {
    spatial_line(mapdeck::roads, stroke_colour = "FQID", stroke_opacity = 3, stroke_width = 3)
  },
  times = 5
)

# Unit: milliseconds
#          expr        min         lq      mean     median       uq       max neval
#       leaflet 5643.93449 5701.28007 5877.8871 5724.51738 5765.045 6554.6590     5
#     googleway 2568.89439 2578.94522 2651.6126 2614.33833 2704.747 2791.1383     5
# spatialwidget   92.20373   96.02003  107.3444   98.26774  103.182  147.0484     5
mdsumner commented 6 years ago

This is very nice helpful, can I ask about the dots passed on to invoke_method, is that some way of keeping track at R and js levels? (Or just some magic?)

SymbolixAU commented 6 years ago

Merely laziness on my part here, to indicate there are other arguments in those functions :)

tim-salabim commented 6 years ago

They are just additional arguments passed from R to the js method that is being invoked. See e.g. here for the R side and correspondingly here for what the js binding (the method) receives.

mdsumner commented 6 years ago

oh phew, thanks!

SymbolixAU commented 6 years ago

I've added three R functions, widget_point(), widget_line() and widget_polygon() which you can use directly, and I've updated the README and merged all the dev into master.

I think this gives more concrete examples of what I'm aiming for.

I'm going to use this spatialwidget library in mapdeck and googleway, so that's been the primary focus of my design, but if there's anything leaflet/mapview would benefit from let me know.

tim-salabim commented 6 years ago

Cool! I will play with it in the near future. At the moment still focussing on leaflet.glify performance and usability enhancements.

SymbolixAU commented 5 years ago

To keep this thread updated, I'm planning on submitting spatialwidget to CRAN in a week or so

harryprince commented 5 years ago

I propose a detailed comparison between leafgl, deckgl and mapdeck to figure out which is the best solution when we need to plot large-scale points. Wish SQL monkey like me can save more time.

https://github.com/r-spatial/leafgl/issues/11

tim-salabim commented 5 years ago

This discussion is partly related to #13. Hence, I'd be inclined to leave it open for now.

I still haven't gotten around to play with spatialwidget but my feeling is that it is the closest we have come to a normalised spatial data package.