r-spatial / sf

Simple Features for R
https://r-spatial.github.io/sf/
Other
1.33k stars 294 forks source link

geom_sf() #88

Closed edzer closed 7 years ago

edzer commented 7 years ago

We need a geom_sf() to render simple feature objects in ggplots, e.g. where

nc = st_read(system.file("shape/nc.shp", package="sf"))
ggplot() + geom_sf(nc)

would add the polygon boundaries, and further argument can alter the color of polygons and polygon boundaries, as well as point and line geometries. I wrote some building blocks, basically functions that convert sf objects into grid grobs here and am willing to help out, but would appreciate some help getting started from someone who's been here before: @thomasp85, @hadley, @mdsumner , @hrbrmstr?

Issues to keep in mind:

thomasp85 commented 7 years ago

One thing to keep in mind is that ggplot2 works with data in data.frames. I'm not familiar with sf object and how they are encoded, but this needs to be figured out first (maybe it already is)...

edzer commented 7 years ago

Thanks - sf objects are data.frames. The geometry is in a list-column, which, since Hadley's keynote during UseR! is considered "tidy".

thomasp85 commented 7 years ago

Super - that was a quick solve :)

You are welcome to open an issue in ggforce if you wish, unless you're planning on a separate ggsf package or something...

edzer commented 7 years ago

Thank you, great idea - I opened the issue there.

hadley commented 7 years ago

I think this should either be included in sf itself or in ggplot2.

thomasp85 commented 7 years ago

If you want it in ggplot2 I'm fine with moving it over there...

edzer commented 7 years ago

Maybe develop it in ggforce first, and then decide?

thomasp85 commented 7 years ago

I'll leave that up to you and Hadley - ggforce probably has a quicker release cycle than ggplot2 (unless Hadley has hired a new intern:)...

hadley commented 7 years ago

I think it's important enough that I'd do a ggplot2 release specifically for it

thomasp85 commented 7 years ago

Ok - I'll see if I can get time to make a proof-of-concept implementation of it in a ggplot2 branch instead of ggforce

hadley commented 7 years ago

Awesome - thanks!

edzer commented 7 years ago

Sounds great!

hadley commented 7 years ago

Wrt to coordinate systems, I think it might be to transform each layer to lat/lon coordinates, and then assume the user will use coord_map(), coord_quickmap(), or coord_proj() to ensure correct project.

Alternatively it would be possible to have some sort of helper that did:

something_sf <- function(sf, ...) {
  list(
    geom_sf(...),
    coord_map(proj = get_projection(sf))
  )
}

(this is purely imaginary code, but it would mean that the last sf added "wins" the battle of coordinate systems)

Doing anything more automatically would require bigger changes to ggplot2 because there's no automatic guessing of coordinate systems.

edzer commented 7 years ago

Reading ?mapproject, it looks like mapproj does not have any notions of a datum -- never assume that lat/lon always means the same, or means anything much without knowing the datum! The rest of the (open source) world has essentially settled on GDAL/PROJ.4 as the code base to categorize and convert to/from projections and transform between datums; mapproj was developed independently by a group with different goals. Roger Bivand @rsbivand will be visiting here next week Mon-Wed to work on sf, we will also take up this issue and report back.

hadley commented 7 years ago

Datum is what I was missing - thanks!

The easiest way to get this to work with ggplot2 would be to have some sort of universal numeric representation of location, but I'm guessing that that doesn't exist. Alternatively, if there was a special vector class that had an attribute we could do something like what we do for POSIXct + time zones at the scale level.

Hmmm, if coord_map() is suboptimal (since it uses mapproj) maybe it's worth considering ggspatial or similar which would bundle coord_proj() from @hrbrmstr and this geom. (And then in the long-term, deprecate the existing spatial stuff from ggplot2).

edzer commented 7 years ago

There is a great deal of interoperability in different coordinate reference systems (CRS), look for instance at http://spatialreference.org/ref/epsg/4326/ for 14 encodings of datum WGS84. Geodesists and national mapping agencies all over the world work continuously to improve this, but ignore package mapproj. GDAL provides automatic conversions, and will read and write data anywhere, converting CRS encodings appropriately.

Package sf has a geometry list-column that carries a crs attribute:

> library(sf)
Linking to GEOS 3.5.0, GDAL 2.1.0
> nc = st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
> st_crs(nc)
$epsg
[1] 4267

$proj4string
[1] "+proj=longlat +datum=NAD27 +no_defs"

attr(,"class")
[1] "crs"

having epsg and proj4string has proven to be enough, over the last 15 years, for spatial data in R to interoperate with the rest of the world.

I think this is as close as you can get to a universal numeric representation of location, and to the POSIXct + TZ metaphor.

hadley commented 7 years ago

Does this approach sound reasonable?

This is a bit inefficient because of needless re-projection, but it would make initial ggplot2 implementation simpler (and then in the longer run we could figure out a system where you'd get a default coord_proj() automatically).

edzer commented 7 years ago

Not all spatial data have a CRS: it may be missing, local, or non-Earth related (think astrophysics, cell biology or MRT scan data). I think

Ideally, the projection should happen by st_transform as that understands the source CRS.

(And as a side node: 4267 is not WGS84, but NAD27: a local (North American) Datum)

hadley commented 7 years ago

There's currently no way for a geom to request a coordinate system, so we have to rely on the user supplying either coord_fixed() (1:1), coord_quickmap() (better aspect ratio), or coord_proj().

If we can't always convert to WGS84, then we'll need a new vector class:

projected <- function(x, proj4string) {
  stopifnot(is.numeric(x))
  structure(x, class = "projected", proj4string = proj4string)
}

which will be (redundantly) applied to both x and y. Then we'll need a new spatial_trans() function (along the same lines as time_trans()) and a newScaleContinuousProjected`: together that will ensure that all layers project to the same coordinate system. That's not exactly elegant, but I think it's the easiest way to get this working properly in ggplot2.

We'll also need some hacks so that a geom_sf() layer can convey its dimensions to the scales. I think the easiest way to do that will be automatically add xmin, xmax, ymin, and ymax aesthetics mapped to the bounding box of each feature.

edzer commented 7 years ago

Much of this is beyond my understanding. Do note that you can't reproject x and y independently.

hadley commented 7 years ago

Yeah, I was just thinking that too. In that case I think the best we can do is give a warning if there are multiple CRSs. Anyway, I think I at least understand the main problems now, and I can keep mulling over what we need to do in ggplot2 to better support spatial data.

It feels like ggspatial (or ggsf or whatever we want to call it) could be a good project for another summer intern.

thomasp85 commented 7 years ago

This sounds more and more like an effort of ggraph scope, which is not something I can take upon me right now. I'll be happy to chime in but I think someone else needs to drive it forward

edzer commented 7 years ago

While Hadley is thinking about an utopian way to dynamically reproject maps on the fly, we can do something simple now that will make many users happy, by making clear (documented) assumptions:

sf provides the means to verify both assumptions, but we can even leave that up to the user, for now.

How does that sound?

hadley commented 7 years ago

Yeah, we can absolutely start by assuming that all layers are already correctly projected.

barryrowlingson commented 7 years ago

Why do you even need a geom_sf? Can you not overload the existing geom_polygon (or geom_map?) function for sf-class objects? Or is there something in the ggplot2 architecture that makes this hard?

hadley commented 7 years ago

Because geom_polygon expects one row per vertex, geom_sf() would have one row per feature.

barryrowlingson commented 7 years ago

The current geom_polygon expects one row per vertex. But could it not be made so that if the data is an sf object (or indeed inherits from sf) that it draws polygons using the geometry in each row?

I'm not sure putting the name of a class in a geom_name (as geom_sf) is a good idea either. There's no geom_data.frame, and this would seem to violate the grammar inherent in geom function names. Given that a single sf object could contain objects of differing geometries (points, lines and polys) it seems hard to decide on a good name for a geom function that could output such a range of different possible shapes. Also, if I create a class that inherits from sf then I don't want to have to remember that underneath its an sf object.

How about geom_spatial instead? This has no reference to class names, and describes the output as being the spatial component of the data. Users would then expect to see a map of the row-based spatial geometry encoded in whatever object they feed in, be it points, lines, or polygons. This could possibly even be ported to sp classes of it can be made generic.

thomasp85 commented 7 years ago

geom_polygon requires at least two rows for a single line and will throw warnings if only a single row is provided.

As i understand it sf data are data.frames containing a column with the sf data (I might be totally wrong - haven't got time to look into sf yet). Anyway I see no violation of the grammar as long as a geom requires a specific column class for some of its operations...

barryrowlingson commented 7 years ago

Both sf and sp classes are capable of being mapped spatially (ie as geographic features, according to coordinate projections etc). If there's a geom_sf then there could also be a geom_sp which would do exactly the same thing but would only work if you fed it an sp-class object. Better to have a geom_spatial that would despatch on whatever you fed it and do the right thing for that class, hiding complexity from the user and preventing errors.

hadley commented 7 years ago

Data representation is important for graphics - that's why we have geom_tile() and geom_rect() and geom_bar() and not just geom_polygon(), and why we have both geom_segment() and geom_line(). geom_sf() would be tied to a specific vector data structure, the simple feature, so calling it geom_sf seems reasonable to me. Sp is like a data frame, a simple feature is like a numeric vector. You can't simply dispatch the problem away because the level at which you would dispatch would be different.

Additionally, we'd have to modify geom_polygon() in a backward incompatible way, which would be a bad idea.

barryrowlingson commented 7 years ago

You've put the cart before the horse. geom_sf would only be tied to a specific data structure if you chose to do it that way. But sure, write geom_sf, but also write geom_spatial that calls geom_spatial.sf if the data inherits from sf, or calls geom_spatial.sp if the data inherits from sp classes, or even geom_spatial.postgis if its an object describing a connection and query to a PostGIS database. The output marked on the plot is the spatial representation of each row. No change of user code needed for mapping different types of spatial objects.

This will only aid adoption of the new classes, especially if geom_spatial.sp gets written. Then all my ggplots with sp-class objects will Just Work if I switch to sf objects. Win.

The only way I can see this not working is if geoms don't get to see what sort of data object is being passed to them, and only get a list of aesthetics and values... but I'd be surprised if it couldn't be done somewhere in the code, perhaps by geom_spatial passing an appropriate GeomSF/GeomSP object to the layer constructor...

hadley commented 7 years ago

I think you're missing the fundamental difference between sf and sp, and unfortunately there's no way a geom could work in the way you describe.

mtennekes commented 7 years ago

First of all @edzer 👍 package. I'm happy to upgrade tmap to deal with sf objects. I'm new to simple features, but it looks like an elegant way to store spatial data.

However, I don't get the fundamental different between sf and sp yet. Is there any (meta) data stored in a sf object that is missing in a sp object, or vise versa? I mean, it seems like I can convert them back and forth:

nc <- st_read(system.file("shape/nc.shp", package="sf"))
nc_sp <- as(nc, "Spatial")

Or can a sf object consist of multiple geometry types (since it is vector based)?

edzer commented 7 years ago

@mtennekes - thanks! The metadata is stored in the attributes of the simple feature geometry list-column, but you get them easiest by st_bbox(sf) and st_crs(sf).

Yes, conversion should work for (multi)points, (multi)linestrings and (multi)polygons; in case of GEOMETRYCOLLECTION or other classes you can either give an error or make the additional effort.

edzer commented 7 years ago

Or can a sf object consist of multiple geometry types (since it is vector based)?

yes: an sfc (list-column with geometries) can be of type sfc_GEOMETRYCOLLECTION, in which case each geometry has type GEOMETRYCOLLECTION, and it can be sfc_GEOMETRY, in which case the geometries can be a mix. See vignettes for full longer story.

mdsumner commented 7 years ago

One of my utopian visions would be to have hierarchical spatial data created by expression in a general grammar. There's a sense in which a the gg aesthetic settings are a select, group by, arrange chain and the plotted geoms could alternatively result in persistent spatial layers. I see real value in an integration of complex object structure and aesthetic mappings, in a way that say a layer in GIS incorporates geometry and aesthetics, but is (usually) stripped back to only geometry or only presentation form when serialized, or published.

I don't think that is too utopian, but it requires some deeper integration of "spatial" and "the grammars". Some can already be easily done, for example a table of points measuring animal tracks over time is organizable by straight forward dplyr verbs and the result can be plotted with ggvis, as multi points, or multi-lines, or passed to an sf or sp composer function.

The concepts are not radically different, in fact I think they are exactly the same - but the tools are not easily compatible. I've been exploring areas where the two worlds are brought closer, but a fuller system needs wider discussion and involvement. I think there's a very timely opportunity here.

mdsumner commented 7 years ago

Is is accurate to say that sf objects (from a single row) come as their own custom stat? Is there any example of geoms like that already? @hadley

I also realized that the fortify model would work, as long as we think about it on a per-feature basis, not over whole dataset, the internal decomposition would essentially be the same as that iiuc, though modified for the extra structure in multipolygons, applied to the grob calls.

etiennebr commented 7 years ago

I'll repeat a bit from #131, but I think I'd rather have an interface on sfc than sf. Something like :

st_read(system.file("shape/nc.shp", package="sf")) %>% 
  ggplot(aes(sf=geom, fill=SID79)) +
  geom_sf()  # or some other name :)

That would offer free support for every class that inherits from data.frame. Piping is also easier.

More complex stuff is also easier to imagine (at least for me):

st_read(system.file("shape/nc.shp", package="sf")) %>% 
  as_tibble() %>%  # (sfc does not allow multiple geometries)
  mutate(buffer = st_buffer(geometry)) %>% 
  ggplot() +
  geom_sf(geometry, aes(sf=geom, fill=SID79))  + 
  geom_sf(buffer, aes(sf=geom, fill=NA))

It might also be worthwhile to consider that geometries have multiple representations; this is central to the grammar of graphics. A polygon could be represented by its vertex for instance. Would that be geom_sf(option = "point"), overload geom_point or geom_vertex? What about overloading all the geoms that could receive a geometry: geom_point, geom_segment, geom_polygon ? The case of GEOMETRY with multiple geometries being a bit ambiguous.

edzer commented 7 years ago

For those of you interested in what's happening: there is a working geom_sf() in the sf branch of ggplot2. Right now, it uses equirectangular projection, scale true at mean latitude, similar tosp::plot and sf::plot, and for now it will move everything to WGS84 before plotting.

library(sf)
library(maps)
world1 <- st_as_sf(map('world', plot = FALSE, fill = TRUE))
library(ggplot2)
ggplot() + geom_sf(data = world1)

gives ggplt

and, from the help pages of geom_sf

 nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"))
ggplot(nc) + geom_sf(aes(fill = AREA))

gives nc

Comments are welcome.

hadley commented 7 years ago

@edzer It ensures all layers have the same CRS (not necessarily WGS84) - that's just used as the datum for the graticule

edzer commented 7 years ago

Ah, I didn't know you got that far. I now see:

library(sf)
library(maps)
library(ggplot2)
world1 <- sf::st_as_sf(map('world', plot = FALSE, fill = TRUE))
world2 <- st_transform(world1,
    "+proj=laea +y_0=0 +lon_0=155 +lat_0=-90 +ellps=WGS84 +no_defs")
ggplot() + geom_sf(data = world2)

wrld

where we need to revisit the tick labels, but otherwise: nice!

rsbivand commented 7 years ago

Very promising!

GreenStat commented 7 years ago

HI, I tried point and line geometry with (the experimental) 'geom_sf' :

library(sf)
library(maps)
library(ggplot2)

#points
data(meuse,package="sp")
meuse_sf <- st_as_sf(meuse, coords = c("x", "y"), crs = 28992, agr = "constant")

summary(meuse_sf)

meuse_sf %>% ggplot() + geom_sf(aes(size=cadmium))

this gives: tmp1

I am a bit confused by the behaviour of aes in geom_sf, with aes(size=cadmium) I expected varying size of dots, instead it gives me varying size of line thickness of the circles. How do I change type and size of dots?

meuse_sf %>% ggplot() + geom_sf(aes(color=cadmium)) gives: tmp2

line color of the circles changes, I want a fill....

meuse_sf %>% ggplot() + geom_sf(aes(fill=cadmium)) this gives: tmp3

more confusing, I still want a fill.... :-), how can I do that ?

lines

I took data from the trajectories package

data(storms,package="trajectories")
summary(storms)
plot(storms)
#this coerces a TracksCollection  to a SpatialLinesDataFrame
storms.sp <- as(storms,"SpatialLinesDataFrame") 
# sp to sf
storms.sf <- st_as_sf(storms.sp)

check if file is oke

storms.sf %>% plot()
#Error in `levels<-.factor`(`*tmp*`, value = as.character(if (is.numeric(breaks)) x[!duplicated(res)] else breaks[-length(breaks)])) : 
#  number of levels differs

# Ai, this is caused by posix  columns  in the sf, tmin and tmax are POSIXct
#if I drop them with storms.sf[,-c(6:7)] the problem is solved
# bug ?

# get a date column in the sf
storms.sf$date <- format(storms.sf$tmin,"%Y-%m-%d")

# standard sf plot
storms.sf[,-c(6:7)] %>% plot()

this gives : tmp4


# test some options
storms.sf[,-c(6:7)] %>% ggplot() + geom_sf(aes(colour=date)) + theme_bw()
storms.sf[,-c(6:7)] %>% ggplot() + geom_sf(aes(lty=date)) + theme_bw()
storms.sf[,-c(6:7)] %>% ggplot() + geom_sf(aes(size=date)) + theme_bw()
storms.sf[,-c(6:7)] %>% ggplot() + geom_sf(aes(fill=date)) + theme_bw()

resulting in: (only first 2 plots) tmp5 tmp6

here the legends are confusing and geom_sf(aes(fill=date)) gives me black lines with a legend with colors (not shown ,identical to points example)

to end with a :+1:, combining maps is possible and looks great , this:

storms.sf[,-c(6:7)] %>% ggplot() +  geom_sf(aes(colour=date)) +  geom_sf(data=world1) +
 xlim(100,0) +ylim(0,60) + theme_bw()

gives : tmp9

cudos !

hadley commented 7 years ago

@GreenStat could you please file the line and point issues individually on ggplot2?

@edzer I think we can now close this issue

adrfantini commented 7 years ago

Sorry to chime in in this now-closed issue (from which I learned a lot), but I have a fast question. I did not exactly understand what ggplot is doing right now. Am I right in assuming that if no projection information is provided in the input sf (could this even happen?), it will assume datum = sf::st_crs(4326)? And that if a datum is otherwise available, it will use that?

What if multiple geom_sf() are provided? Will all the data st_transform()ed to the datum of the last call?

solomonsg commented 4 years ago

Dear all,

I am plotiing values in attribute table of a shapefile and the values in one of the table ( e.g. Duration) ranges from 0-4370 and i want to classfy them into 5 using somthing like breaks, breaks=c(0, 2776, 3061, 3272, 3558, 4370)

Is there any way to to do this classfication in r;

shp <- read_sf('Duration.shp')
ggplot(shp) + geom_sf(aes(colour = Duration))

Best regards,

Solomon

barryrowlingson commented 4 years ago

Create a new factor variable by cutting on those breaks and plot that:

# first fake some point data:
shp = data.frame(x=runif(100), y=runif(100), Duration=runif(100,0,4370))
shp = st_as_sf(shp, coords=c(1,2))
# then...
breaks=c(0, 2776, 3061, 3272, 3558, 4370)
shp$DurationLevel = cut(shp$Duration,breaks=breaks)
ggplot(shp) + geom_sf(aes(colour=DurationLevel,fill=DurationLevel),size=4)

I've specified a fill and colour aesthetic otherwise the legend symbols don't get drawn properly. There's probably a way of making the legend draw as dots, but you've probably got polygons so adjust as needed...

Use the scale_*_discrete functions to change the colour scheme.

phdykd commented 3 years ago

Error: Object is neither from class sf, stars, Spatial, nor Raster. I have merged two dataframes. One of them had long,lat and the other one had geom for Africa. However, no matter what I do, I got the error saying "Error: Object is neither from class sf, stars, Spatial, nor Raster. I am trying to use geom_sf and tm for mapping. None of them worked. Any recommendation? Thanks

edzer commented 3 years ago

Maybe it is a data.frame. Have you tried passing it to st_sf() first?