r-spatial / rspatial_spark

This is the repo that sparked https://github.com/r-spatial
36 stars 1 forks source link

overview of spatial for 2017 #8

Open mdsumner opened 7 years ago

mdsumner commented 7 years ago

Discuss!

EDIT: I've update this ramble into a slightly more accurate ramble here: http://mdsumner.github.io/2017/01/10/spatial-r-2017.html


title: 'R spatial: 2017' author: "Michael Sumner" date: "31 December 2016" output: html_document

This document is a broad overview of what I see as most relevant to future spatial, for 2017 and beyond. I've tried to be as broad as possible, without going into too much detail but also haven't been very careful, WIP.

The state of things

An enormous amount of activity has been going on in R spatial. The keystone activity is the new simple features for R package sf, and the many responses to "supporting sf" in various packages but there are many other non-obvious linkages.

Personally, I have learnt quite a lot recently about the broader context within R and I'm keen to help consolidate some of the "non-central" tools that we use. There are also some surprisingly helpful implications of the new simple features support both for within the central package, and for the ecosystem around it.

sf

The simple features package sf was one of the first supported projects of the RConsortium and has been created by Edzer Pebesma. This package replaces sp vector data completely, includes a replacement for rgdal and rgeos and there is a long list of important improvements. These are described in full in the (package vignettes)[https://CRAN.r-project.org/package=sf] and (blog posts)[http://r-spatial.org/].

The key changes relevant here are

I strongly recommend getting familiar with the sf data structures, it's really important to understand the hierarchy levels and the ways the vectors (POINT) and matrices (everything else) are stored. POINT is a vector, MULTIPOINT is a matrix, POLYGON is a list of matrices (one island, zero or more holes), LINESTRING is a matrix (same as m-point), MULTIPOLYGON is a list of lists of matrices (a list of POLYGONs, effectively), and MULTILINESTRING is a list of matrices (a list of LINESTRINGs, effectively and structurally the same as a POLYGON).

If you want to see how to convert between sf forms and extract the raw coordinates I would look at st_cast in the dev version of sf, and this version of building leaflet-coordinate lists for mapview:

https://github.com/mdsumner/mapview/blob/simple-features-2/R/sf2.R

rasters

The raster package is apparently being replaced. There are some pending updates not yet committed in trunk (by me), but it's unclear what the future maintenance of raster will be. There are some interesting extensions to raster on CRAN, velox, fasteraster, and the unrelated dggridR.

HDF5 is now fully supported by rhdf5 on Bioconductor, this could also replace many of the NetCDF4 formats supported by ncdf4, and notably can be used to read NetCDF4 files with compound types.

Point clouds

Recent update on CRAN is rlas, it will be trivial to push these data into sf types, but it won't always make sense to do so. You could have a multipoint with X, Y, Z, and M (but none of the other point attributes) or a point (XYZ) with all the attributes in one sf data frame. This package will be useful for driving interest in the exotic sf types.

It's easy and readily doable right now to read LIDAR data with rlas, and plot it interactively with RGB styling and so in with plotly. Keen to try writing a "detect ground" algorithm? Try it! What does st_triangulate do with a XYZ multipoint? (hmm good question)

mapview and leaflet

mapview has support from RConsortium to bring user interaction to spatial in R. Currently building in support for sf.

leaflet has a huge number of new extensions thanks to leaflet.extras, and there is ongoing updates around integrating crosstalk which will be very importatnt for interactive map applications.

There is an issue with leaflet in that internally it uses a "flat list" of polygons, much like SpatialPolygons does - this means there's ambiguity about which hole belongs to which island, and while that doesn't matter for visualization or point-in-polygon tests (the evenodd rule sorts that out), it does have implications for round-tripping to simple features. If we edit a polygon in leaflet, there will need to be a way to know what holes belong where. Currently in mapview the extra level is dropped, effectively returning a SpatialPolygons-like list.

Should leaflet add the extra hierarchy for multipolygons? should it support sf types directly (without importing sf)? We need a common approach here to avoid awkward and fragmented workarounds.

simple features in other packages

tmap, mapview, spbabel, stplanr, .... all have internal veresions of sf types converted to something else. I think we should standardize on something general, discussed in very general terms here: https://github.com/r-gris/table-r-book/blob/master/01-2-overview.Rmd - that is a draft document, but I think a simpler version would be sufficient to support a general sf-converter engine.

Exotic types in sf

These are TINs, Polyhedral Surfaces (multipatch - basically polygons with shared "internal" edges), curves and various combinations and varieties of these. None of the triangulations use an indexed mesh, which makes them a bit clunky and probably only for very bespoke uses, but they provide interesting territory to explore. Certainly you can use them to build 3D plots in rgl and plotly (show examples, thanks to @timelyportfolio).

Note that a GEOMETRYCOLLECTION of triangle POLYGONs is effectively the same as a simple features TIN, it doesn't really add any structure improvement to the way the thing is put together: https://github.com/mdsumner/sfr/blob/constrained-triangulation/R/rt_triangulate.R

The limitis of simple features

Simple features can't fully represent GPS and other track data, indexed meshes (like rgl mesh3d, segmented paths), or custom hierarchies like networks, nested objects like counties within states, or arc-node topology (like TopoJSON), and it can't store aesthetics with primitives (like ggplot2/ggvis, rgl, plotly and others can). R can do all of these things, in many different ways and converting from and to sf is not too difficult. (The sf package is useful for developing more general tools that can work with these structures ...)

Note that GDAL is going in this direction, but we already have most of this capability in R, it's just scattered all over the place: http://lists.osgeo.org/pipermail/gdal-dev/2016-December/045675.html

htmlwidgets and plotly and mapview and leaflet

plotly is already useable for many applications, we can use techniques from rangl to put sf data into it, and we can use that to easily create exotic triangulated surfaces that are pretty inefficient in the simple features form:

http://rpubs.com/cyclemumner/rangl-poly-topo-plotly Geotiff.js?

timevis?

My plans

I'm pretty comfortable now with sf and using it for what I want, I have converters in spbabel and rangl to build the forms and workflows I need, and the support in mapview and leaflet and plotly provides more than enough to go with. I will work on making this as accessible and general as possible, and work on integrating it to replace my work on tracking data (the trip and SGAT packages), with 3D models (rbgm, quadmesh) and integrating it with htmlwidgets tools.

I'm trying to learn how to extend ggplot2 for sf, I'm a bit stuck on how to deal with scales.

I haven't yet looked at curves, but I'm keen to see this capability in R both for sf and more generally, we could easily represent the forms made possible by TopoJSON (see "Bostock flawed example"), there is curve support in grid.

We could provide smart geo-spatial finite element forms (triangulations, quads),

I'm keen to see how ggvis and ggplot2 represent geometric types for objects that shared vertices, such as intervals and bar charts, and how we can put in indexed data structures (unlike what ggraph is doing, it builds a mesh from a group of coordinates, you can't provide it with and index-mesh). I think we can build spatial structures that can store all of these things, so we could throw sf at ggplot2, and use the output as a kind of super-form that knows how to wrap it up into an interactive 4D plot, how to display its primitives etc. etc.

GIS itself needs what we can already do in R, it's not a target we are aspring to it's the other way around.

Robinlovelace commented 7 years ago

One question: where do we find out about the latest happenings in the world of raster? Is it one the r-forge site for raster or is there another place?

mdsumner commented 7 years ago

Contact the author is all I know.

I also recommend agitation with details of what you want, since the ways of working are so diverse.

I now use raster and dplyr together extensively using the cell indexing tricks, and it's trivia to make systems of tables to work with raster's affine model. I also use this to deal with curvilinear 4D grids from ocean models, using the affine-cell tools in index space. This allows reasonably automatic conversions of vector maps to the index space of the grid, and so extraction is trivial and fast and scales to very long time series.

But it's hard to see how to wrap that up, and it's not like a lot of other workflows we need. What are you wanting from a future raster?

edzer commented 7 years ago

The fact that Robert doesn't use github actively doesn't render raster in need of replacement. The package has over 44K lines of code, excluding documentation, in comparison sp has 8.5K. Robert and I wrote up some preliminary ideas we developed in September. I finished step 1 and 6 last year. My steps towards 2 will largely follow what the people at Paradigm4 did with SciDB and SciDBR, but adding spatial and temporal reference systems to (sets of) array dimensions, similar to what Marius did. We will soon organize an (open) hackaton to start OpenEO.

Someone could start a discussion at r-sig-geo? It still has 3000+ subscribers.

briatte commented 7 years ago

The core of the discussion is above my weight, but I just wanted to chime in to say that any solution to plot spatial objects with ggplot2 will find (a thankful audience of happy) users, especially if the solution makes sensible default assumptions about coordinates/datum.

Relevant discussion: https://github.com/edzer/sfr/issues/88

mdsumner commented 7 years ago

@edzer I will update wiki this, @tim-salabim OK to use this repo wiki?

Updates include:

*there is no leaflet problem, it will get multi polygon support

I'll ping this to the mailing list when updated

tim-salabim commented 7 years ago

@mdsumner of course! Please use this repo

mdsumner commented 7 years ago

I put it out here: http://mdsumner.github.io/2017/01/10/spatial-r-2017.html

tim-salabim commented 7 years ago

Yeah, I was contemplating the idea of putting this whole thing to a github.io page as it is more accessible (online). But I fail to figure out how to move everything in one go...

mdsumner commented 7 years ago

I mean to look at blogdown, I actually updated my blog finally so I might get back to that.

mdsumner commented 7 years ago

Wow, from "should I use this wiki" to https://github.com/r-spatial in 8 days with absolutely no effort by me. :)