ropensci / auunconf

repository for the Australian rOpenSci unconference 2016!
18 stars 4 forks source link

Render large spatial datasets in shiny leaflet #38

Open jeffreyhanson opened 8 years ago

jeffreyhanson commented 8 years ago

Overview

The shiny R package has become one of the most popular ways to interactively explore and visualise data. The leaflet R package provides additional functionality that allows developers to embed interactive maps in their web-apps. These interactive web apps allow users to visualise spatial data (check out this tutorial). However, a lot of spatial data is currently too large for leaflet to natively render in a feasible amount of time (for example the worlds protected area network. I propose developing a fork of the leaflet R package with this capability. This fork would hopefully be integrated into the future versions of the leaflet R package.

Technical problem

The leaflet R package renders vector and raster data as html objects. Each coordinate or pixel must be loaded from a server to be rendered in the browser. This is not a problem for small datasets. But for large datasets, the browser must load hundreds of megabytes worth of coordinates or pixels to render the dataset. This causes the web-app to stall, or crash the browser if the dataset is too big.

Proposed solution

Tiles. Map tile layers are used to render detailed spatial data at various zoom levels. For instance, the following layer is rendered using tiles. See how fast it is at rendering this dataset?

library(leaflet)
leaflet() %>% addTiles() %>% addWMSTiles(
  "http://mesonet.agron.iastate.edu/cgi-bin/wms/nexrad/n0r.cgi",
  layers = 'nexrad-n0r-900913',
  options = WMSTileOptions(format = 'image/png', transparent = TRUE),
  attribution = "Weather data © 2012 IEM Nexrad"
)

I propose we add in the functionality to automatically convert vector and raster data to a tile layer, and render this tile layer instead of the raw data. This repository contains a python script designed to take an image and convert it to a set of tiles specifically for leaflet (modified from gdal2tiles in GDAL).

The trick will be linking everything together. I imagine the process will go something like this:

  1. user inputs vector/raster dataset and a colour scheme
  2. the dataset converted to a RGB 3-band .tif image
  3. .tif image saved to disk
  4. the python script used to generate tiles from .tif
  5. location of the tiles on disk passed to addTiles function
  6. data gets rendered as a tile layer

It would be super cool if we could implement an R version of the python script. As currently, the proposed functionality requires that both python and GDAL are installed on the users machine. However, we might be able to (ab)use the rgdal or gdalUtils R packages to have GDAL installed on the users machine. In theory, the geoprocessing could be handled by the rgeos R package.

Desired functionality

The leaflet R package contains functions to render different types of spatial data. For brevity, I'll just show an example of what I'm thinking of using raster datasets.

Here we have an example showing how you could normally render a raster dataset in R with leaflet.

library(raster)
library(leaflet)
filename <- system.file("external/test.grd", package="raster")
rast <- raster(filename)
leaflet() %>% addRasterImage(rast)

I propose modifying this function so that it has extra arguments that specify if the dataset should be converted to tiles for rendering (tiled; defaults to FALSE), and if so, where the files should be stored (dir; defaults to temporary directory but can be specified elsewhere so), and if a tiled dataset is already present at the location should it be overwritten or just used to render the dataset (overwrite; defaults to FALSE).

Note that the default options for the proposed function yield the same behaviour as the function in the current version of the package.

library(raster)
library(leaflet)
filename <- system.file("external/test.grd", package="raster")
rast <- raster(filename)
leaflet() %>% addRasterImage(rast, tiled=TRUE, dir=tempdir(), overwrite=FALSE)

I think if we could implement something like this for the addRasterImage and addGeoJSON functions in the leaflet R package that would be awesome. Or, if we could do this for all the add* functions that would be amazing.

joelgombin commented 8 years ago

Hi,

any news about this package project? I found it super awesome and was thinking about maybe working on something like that at some point if nothing exists...

mdsumner commented 8 years ago

Have you tried mapview? Also see this proposal for its future direction: https://github.com/environmentalinformatics-marburg/mapview_toolchain/blob/master/mapview_interactive_data_manipulation.Rmd.

On Wed, Aug 17, 2016, 02:23 Joel Gombin notifications@github.com wrote:

Hi,

any news about this package project? I found it super awesome and was thinking about maybe working on something like that at some point if nothing exists...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ropensci/auunconf/issues/38#issuecomment-240156173, or mute the thread https://github.com/notifications/unsubscribe-auth/AD6tb4D-r8xdo0gRxWVeL_cY-K-iAz3Mks5qgeQXgaJpZM4IHFnx .

Dr. Michael Sumner Software and Database Engineer Australian Antarctic Division 203 Channel Highway Kingston Tasmania 7050 Australia

joelgombin commented 8 years ago

Hi,

yes, I use mapview on a regular basis but I'm not sure it offers special tools for viewing very large spatial datasets? Also, the roadmap you point to only mentions very briefly the issue of large dataset. In the meantime I stumbled across your leafier repo; did you abandon its development?

johnbwilliams commented 8 years ago

I'd be interested in contributing to an R package to render large spatial datasets in shiny leaflet.

How can we move this ahead?

On Tue, Aug 16, 2016 at 1:32 PM, Joel Gombin notifications@github.com wrote:

Hi,

yes, I use mapview on a regular basis but I'm not sure it offers special tools for viewing very large spatial datasets? Also, the roadmap you point to only mentions very briefly the issue of large dataset. In the meantime I stuùbled across your leafier repo; did you abandon its development?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ropensci/auunconf/issues/38#issuecomment-240176925, or mute the thread https://github.com/notifications/unsubscribe-auth/AGiO7FwK45V3Jmbl_OCHmLsCZHsgS7_Wks5qgfRHgaJpZM4IHFnx .

jeffreyhanson commented 8 years ago

Hi,

Thanks for showing an interest in this. We had a working prototype by the end of the unconference but I haven't had time to develop this into a package.

We ended up scrapping the tiles because we couldn't work out how to make that interactive. So, we ended up simplifying the vector data to various tolerances (using rgeos::gSimplify) and then clipping to based on the user's view port (using rgeos::gIntersection) and sending that to leaflet for rendering.

The prototype should be in the leafier repo. Let me know if you have any questions.

jeffreyhanson commented 8 years ago

I just found the mapview R package @mdsumner mentioned:

https://github.com/environmentalinformatics-marburg/mapview

joelgombin commented 8 years ago

Correct me if I'm wrong, but there's no actual code in the leafier repo, is it? (except for the geom.subset function) Thanks for the information anyway! i'll keep thinking about this tile thing!

mdsumner commented 8 years ago

This leafier yeah?

https://github.com/ropenscilabs/leafier

Thanks, I'll have a look. I'm not experienced enough with the browser side, but it seems to me a lot could be done to link the web tool straight to the GDAL functions, which are inherently geared to on-demand read. Reading into R is probably not that helpful, but writing the hooks between GDAL and the browser tool is probably reasonable. Ultimately mapview should do that, simply provide a "viewport" request back to registered data sources, and the geo-spatial library/database does the rest. Also there are non-R tools that already do this for leaflet afaik.

Of course R is great for prototyping this kind of thing though, and there are tools like raster that make it possible to minimize the middle-man overhead. (rgdal has these on-demand functions too but far less user friendly).

On Wed, 17 Aug 2016 at 14:08 Joel Gombin notifications@github.com wrote:

Correct me if I'm wrong, but there's no actual code in the leafier repo, is it? (except for the geom.subset function) Thanks for the information anyway! i'll keep thinking about this tile thing!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/auunconf/issues/38#issuecomment-240308521, or mute the thread https://github.com/notifications/unsubscribe-auth/AD6tb2QTOq8pJrfkeWCDdZenVYfgoVZ0ks5qgok0gaJpZM4IHFnx .

Dr. Michael Sumner Software and Database Engineer Australian Antarctic Division 203 Channel Highway Kingston Tasmania 7050 Australia

jeffreyhanson commented 8 years ago

Yeah , sorry I forgot we didn't put it under the master branch

here's the prototype:

https://github.com/ropenscilabs/leafier/tree/amy-jeff-dev/leaflet_test

EDIT: turns out we did - and I just can't read haha

joelgombin commented 8 years ago

Ah thanks, I was looking at the paleo13/leafier repo!

KrysD commented 7 years ago

Hi, i have tested your package. I think it is in the good way. I haves tested it with a 60K+ row spatialLineDataframe. Wanted to share with you my experience with a large data frame.

the geom.simplify() took 3min to disaggregate and give me a 1.1Gb list (started with a 37.5Mb Large SpatialLines) with the same variables as in your example.

capture

kambanane commented 5 years ago

Mr. Hanson,

Do your functions (geom.subset and geom.simplify), work with points? The code is specifically for polygons. I attempted to modify them to get them to work with points, but have as yet been unsuccessful.

# define functions geom.subset.points <- function(spgeom, top, left, bottom, right ){ coords <- data.frame(y = c(left, right, right, left ), x =c(top, top, bottom, bottom) ) poly <- sp::SpatialPoints(coords) polys <- sp::SpatialPoints( list(poly) , ID =0) polys.spatial <- SpatialPoints(list(polys), proj4string = spgeom@proj4string) inter <- rgeos::gIntersection(spgeom, polys.spatial, byid =TRUE) }

jeffreyhanson commented 5 years ago

Hi,

Thanks for getting in touch. I wouldn't really recommend using theses functions as there has been a lot of work done on solving this problem since the unconf. @tim-salabim has been doing some amazing on rendering large spatial data sets in leaflet (see https://github.com/tim-salabim/leaflet.glify; both polygon and point data supported I think). @SymbolixAU has been doing fantastic work too (see https://github.com/SymbolixAU/mapdeck; though this uses Deck.gl and not leaflet). So, I would recommend trying out those packages and see if they can do what you need.

Cheers,

Jeff