tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.49k stars 2.02k forks source link

geom_sf() does not respect position argument of scale_y_continuous() #2846

Closed sambweber closed 6 years ago

sambweber commented 6 years ago

I'm working with ggplot2 and geom_sf at the moment to create some maps, but I can't get the axis labels to re-position to the right using the position argument within scale_y_continuous as I would with other geometry types (e.g. geom_polygon in the example below). Is this intentional or just not supported yet for geom_sf?


library(sf)
library(ggplot2) 
library(dplyr)

nc = st_read(system.file("gpkg/nc.gpkg", package="sf"))

# axis labels still on left
 ggplot() + geom_sf(aes(fill = AREA),data=nc) + scale_y_continuous(position='right')

# Convert layer to sp and fortify
 nc2 = as(nc,'Spatial')
 nc2@data$id = rownames(nc2@data)
 nc2  = fortify(nc2) %>% left_join(nc2@data)

# axis labels correctly moved to the right
 ggplot() + geom_polygon(aes(x=long,y=lat,group=group,fill=AREA),data = nc2) + 
   scale_y_continuous(position='right') + coord_equal()
batpigandme commented 6 years ago

With reprex, below.

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
library(ggplot2) 
library(dplyr)

nc = st_read(system.file("gpkg/nc.gpkg", package="sf"))
#> Reading layer `nc.gpkg' from data source `/Library/Frameworks/R.framework/Versions/3.5/Resources/library/sf/gpkg/nc.gpkg' using driver `GPKG'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs

# axis labels still on left
ggplot() + geom_sf(aes(fill = AREA),data=nc) + scale_y_continuous(position='right')


# Convert layer to sp and fortify
nc2 = as(nc,'Spatial')
nc2@data$id = rownames(nc2@data)
nc2  = fortify(nc2) %>% left_join(nc2@data)
#> Regions defined for each Polygons
#> Joining, by = "id"

# axis labels correctly moved to the right
ggplot() + geom_polygon(aes(x=long,y=lat,group=group,fill=AREA),data = nc2) + 
  scale_y_continuous(position='right') + coord_equal()

Created on 2018-08-22 by the reprex package (v0.2.0.9000).

clauswilke commented 6 years ago

Indeed, the setting is currently ignored. The relevant line in the code is here: https://github.com/tidyverse/ggplot2/blob/5a37ae7dd590c045d8fe5d01ea66d4e733946713/R/sf.R#L408 (The right axis is simply hardcoded to nothing.) For comparison, the relevant code for the other coords looks like this: https://github.com/tidyverse/ggplot2/blob/5a37ae7dd590c045d8fe5d01ea66d4e733946713/R/coord-.r#L72

A similar problem exists for x axes placed on the top.

Not sure how hard it would be to fix this, but it would definitely be desirable to do so.

clauswilke commented 6 years ago

I guess the problem is that for coord_sf() we can't be certain that any tick labels calculated for one side of the plot will be appropriate for the other, because not every graticule that starts on the left ends on the right and vice versa. So this would require some additional work to figure out which graticules to label and with what.

clauswilke commented 6 years ago

We can highlight these issues by projecting into an unusual coordinate system (for the part of the world we're looking at).

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
library(ggplot2) 

nc = st_read(system.file("gpkg/nc.gpkg", package="sf"))
#> Reading layer `nc.gpkg' from data source `/Library/Frameworks/R.framework/Versions/3.5/Resources/library/sf/gpkg/nc.gpkg' using driver `GPKG'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs
p <- ggplot() + 
  geom_sf(aes(fill = AREA), data=nc) + coord_sf(crs = 3338)
p


ggplot_build(p)$layout$panel_params[[1]]$graticule
#> Simple feature collection with 12 features and 10 fields
#> geometry type:  MULTILINESTRING
#> dimension:      XY
#> bbox:           xmin: 0 ymin: 0 xmax: 1 ymax: 1
#> epsg (SRID):    NA
#> proj4string:    NA
#> First 10 features:
#>    degree type    degree_label                       geometry    x_start
#> 1     -84    E 84 * degree * W MULTILINESTRING ((0.402943 ... 0.40294301
#> 2     -82    E              NA MULTILINESTRING ((1 0.09396... 1.00000000
#> 3     -80    E              NA MULTILINESTRING ((1 0.35286... 1.00000000
#> 4     -78    E              NA MULTILINESTRING ((1 0.60422... 1.00000000
#> 5     -76    E              NA MULTILINESTRING ((1 0.84879... 1.00000000
#> 6      32    N              NA MULTILINESTRING ((0.8528854... 0.85288544
#> 7      33    N              NA MULTILINESTRING ((0.587803 ... 0.58780298
#> 8      34    N              NA MULTILINESTRING ((0.3201988... 0.32019880
#> 9      35    N              NA MULTILINESTRING ((0.0499912... 0.04999124
#> 10     36    N 36 * degree * N MULTILINESTRING ((0 0.19757... 0.00000000
#>       y_start     x_end     y_end angle_start angle_end plot12
#> 1  0.00000000 0.0000000 0.1169753   150.39212 150.39212   TRUE
#> 2  0.09396438 0.0000000 0.3642504   152.11761 152.11761  FALSE
#> 3  0.35286919 0.0000000 0.6037669   153.84310 153.84310  FALSE
#> 4  0.60422472 0.0000000 0.8362994   155.56859 155.56859  FALSE
#> 5  0.84879350 0.2926339 1.0000000   157.29408 157.29408  FALSE
#> 6  0.00000000 1.0000000 0.1403366    61.22872  62.37905  FALSE
#> 7  0.00000000 1.0000000 0.4039505    60.70585  64.15683  FALSE
#> 8  0.00000000 1.0000000 0.6872251    60.18297  66.14375  FALSE
#> 9  0.00000000 1.0000000 0.9962199    59.66010  68.33983  FALSE
#> 10 0.19757586 0.7499034 1.0000000    60.91500  68.02611   TRUE

Created on 2018-08-22 by the reprex package (v0.2.0).

sambweber commented 6 years ago

Yes, I see the issue. It is more complicated than with other coords. I don't know much about how axes are rendered in ggplot, but in the case that position = 'right' is called would it be possible to place tick marks/labels at y_end where 0 < y_end < 1 in the ggplot_build() output above (i.e 32 - 35 N in this case), rather than at y_start? Is there a hack to do something along these lines by modifying the output of ggplot_build() while a more elegant solution can be found?

clauswilke commented 6 years ago

I've made a branch that can draw labels on all sides: https://github.com/clauswilke/ggplot2/tree/issue-2846-coord_sf It's not configurable yet, though. You can look at the changes and hack your own ggplot2 that draws labels where you need them.

@hadley There's a question about API design here: Should coord_sf() accept secondary axes? I assume not, in which case where labels should be drawn would be a coord setting rather than a scale setting. Making this a coord setting would be quite easy. Carrying all the info properly through from the scales would be more work. I'm also not clear how manual tick placement and labeling works with coord_sf() and primary/secondary axes. Should coord_sf() support the most general case where I can set up a secondary axis with its own ticks and labels? Right now, coord_sf() doesn't even allow manual setting of labels, only of tick positions.

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
library(ggplot2) 

nc = st_read(system.file("gpkg/nc.gpkg", package="sf"))
#> Reading layer `nc.gpkg' from data source `/Library/Frameworks/R.framework/Versions/3.5/Resources/library/sf/gpkg/nc.gpkg' using driver `GPKG'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs
ggplot() + 
  geom_sf(aes(fill = AREA), data=nc) + coord_sf(crs = 2264)


ggplot() + 
  geom_sf(aes(fill = AREA), data=nc) + coord_sf(crs = 3338)

Created on 2018-08-23 by the reprex package (v0.2.0).

hadley commented 6 years ago

It feels like it should be a coord setting to me (or maybe we should adopt a NSEW convention where you could show n/s latitude or e/w longitude labels?) (i.e. if the projection was rotated 45 degrees, the northern labels would occur on top and right)

I'm fine with providing little control over the labels until we work around the general parameters that people want to control.

I think the thing I believe most strongly is that the default labels should align with the graticules.

clauswilke commented 6 years ago

To clarify what kind of an API I'm thinking about: coord_sf() could take an argument, say, graticule_labeling that accepts a named vector or list specifying what types of graticules should be shown on which side of the plot. E.g., graticule_labeling = c(top = NA, right = NA, bottom = "E", left = "N") which would represent the default.

clauswilke commented 6 years ago

@sambweber I made a pull request (#2849). Would you mind installing that branch (with devtools::install_github("tidyverse/ggplot2#2849")) and trying it out?

hadley commented 6 years ago

I was thinking you could make the spec a bit more compact since I don't think you want to be able to say (e.g.) put the east labels on the bottom, so you could collapse to graticule_labels = "EN" or graticule_labels = "NESW" (in others the physical position of the labels is determined by which side of the projected rectangle you want to label)

clauswilke commented 6 years ago

I'm not sure that E labels on the bottom are never useful. In fact, we only have "N" to indicate latitude and "E" to indicate longitude, there is no "S" and "W" (but the sf package translates negative "E" values to positive "W" values when generating labels).

One option for a more compact specification would be to allow hyphens as well, e.g. graticule_labels = "--EN". The one problem with the four-letter string is that we'll have to remember the order in which it specifies sides. Everything else in ggplot2 goes top, right, bottom, left, so it might make sense to mirror that.

hadley commented 6 years ago

Sorry, I've been using N to mean the label at the top of a line of longitude, and S to mean the label at the bottom.

clauswilke commented 6 years ago

I see. Well, my point was that for rotated coordinate systems we may want to change what goes where. For example, the following labeling would make a lot of sense for the NC example I've been playing with, and it couldn't be expressed simply as "ES".

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
library(ggplot2) 

nc <- st_read(system.file("gpkg/nc.gpkg", package="sf"))
#> Reading layer `nc.gpkg' from data source `/Library/Frameworks/R.framework/Versions/3.5/Resources/library/sf/gpkg/nc.gpkg' using driver `GPKG'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs
ggplot() + 
  geom_sf(aes(fill = AREA), data=nc) + 
  coord_sf(
    crs = 3338,
    graticule_labeling = list(left = "E", bottom = "N")
  )

Created on 2018-08-23 by the reprex package (v0.2.0).

clauswilke commented 6 years ago

(For reference, this is the default for this map, and it's definitely inferior.)

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
library(ggplot2) 

nc <- st_read(system.file("gpkg/nc.gpkg", package="sf"))
#> Reading layer `nc.gpkg' from data source `/Library/Frameworks/R.framework/Versions/3.5/Resources/library/sf/gpkg/nc.gpkg' using driver `GPKG'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs
ggplot() + 
  geom_sf(aes(fill = AREA), data=nc) + 
  coord_sf(crs = 3338)

Created on 2018-08-23 by the reprex package (v0.2.0).

hadley commented 6 years ago

Why can't you express that as "NW"?

clauswilke commented 6 years ago

I'm still not entirely sure how you interpret the letters. I looks to me like you're interpreting N, E, S, and W as cardinal directions in the mapped coordinate system, so if north points to the left then N means put labels on the left. This makes sense to humans but I wouldn't know how to implement it.

Alternatively, I have now added a simple string specification in terms of the letters N, E, and - and their positions. We could swap it out by replacing the function that interprets the string. I have also kept the longer list interface, so it would be fine to use a simplified string specification that doesn't cover all possibilities. Finally, I have noticed that if anybody wants to add axis titles to the axes then the title position doesn't follow the graticule labeling specification. Not sure how serious of a limitation that is.

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
library(ggplot2) 

# graticule labeling can now be specified via simple string
nc <- st_read(system.file("gpkg/nc.gpkg", package="sf"))
#> Reading layer `nc.gpkg' from data source `/Library/Frameworks/R.framework/Versions/3.5/Resources/library/sf/gpkg/nc.gpkg' using driver `GPKG'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs
ggplot() + 
  geom_sf(aes(fill = AREA), data=nc) + 
  coord_sf(crs = 3338, graticule_labeling = "-NE")


# however, axis titles don't follow the specification
ggplot(sf::st_polygon(list(matrix(c(0, 1, 2, 0, 0, 2, 1, 0), ncol = 2)))) + 
  geom_sf(size = 3, fill = 'pink') +
  coord_sf(graticule_labeling = "--NE") + 
  labs(x = "x", y = "y")

Created on 2018-08-23 by the reprex package (v0.2.0).

hadley commented 6 years ago

The graticules are formed by lines of longitude (which run EW) and lines of latitude (which run NS). Given that a projection system can arbitrarily rotate the graticules, I think it makes more sense to define the labels in terms of those positions rather than the edges of the plot - i.e. if the projection is 45° rotated, the labelling the northern endpoints of the latitude lines might produce labels on both the top and right of the plot.

clauswilke commented 6 years ago

@hadley I don't disagree with your perspective, I just don't know how to implement it. And as @edzer just pointed out to me, we can't even be sure that graticules run from one side of the plot to the other, they can also loop around, so that, say, north and south can be on the same side of the plot for some graticules.

@edzer Is this an issue that people encounter in map-making, and is there an accepted standard where graticules should be labeled?

clauswilke commented 6 years ago

Maybe I'm getting your point now. It would be fairly easy to place a label at the beginning or end of each meridian or parallel. I'll implement it and see what happens. I believe it may cause weird artifacts in some cases and desired effects in others, so maybe both methods of specification would be helpful.

edzer commented 6 years ago

You'll soon run into both a meridians (NS) and parallels (EW) crossing at a plot border: two labels will overlap, one will be ambiguous.

clauswilke commented 6 years ago

I have implemented @hadley's suggestion of labeling the N, S, E, or W end of the graticules. It can be done, and it can be useful for some situations, but as @edzer says we quickly run into trouble with overlapping labels. With that approach, we also cannot guarantee that users will be able to reproduce results obtained with 3.0.0.

I see two ways forward:

  1. abandon the N, S, E, W approach and go back to my earlier protocol which specifies which graticules to label on which side.
  2. offer both interfaces at the same time, in case somebody finds the N, S, E, W specification useful.
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
library(ggplot2) 

# graticule labeling via cardinal coordinates (N, S, E, W)
nc <- st_read(system.file("gpkg/nc.gpkg", package="sf"))
#> Reading layer `nc.gpkg' from data source `/Library/Frameworks/R.framework/Versions/3.5/Resources/library/sf/gpkg/nc.gpkg' using driver `GPKG'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs

# default, so far so good
ggplot() + 
  geom_sf(aes(fill = AREA), data=nc) + 
  coord_sf(graticule_labeling = "WS")

# labels at the north ends of meridians
ggplot() + 
  geom_sf(aes(fill = AREA), data=nc) + 
  coord_sf(crs = 3338, graticule_labeling = "N")

# labels at the east ends of parallels
ggplot() + 
  geom_sf(aes(fill = AREA), data=nc) + 
  coord_sf(crs = 3338, graticule_labeling = "E")


# we quickly run into trouble with strange effects and/or overlapping labels
ggplot() + 
  geom_sf(aes(fill = AREA), data=nc) + 
  coord_sf(crs = 3338, graticule_labeling = "NW")


ggplot() + 
  geom_sf(aes(fill = AREA), data=nc) + 
  coord_sf(crs = 3338, graticule_labeling = "NE")

Created on 2018-08-24 by the reprex package (v0.2.0).

hadley commented 6 years ago

I don’t mind the overlaps too much — you could always override the breaks/labels if you wanted something different, and I like the parsimony of the NESW approach.

clauswilke commented 6 years ago

Just to make sure we're all on the same page: labels cannot currently be set manually. They are ignored. (This can be fixed, of course; not sure how much work it would be, though.)

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
library(ggplot2) 

# graticule labeling via cardinal coordinates (N, S, E, W)
nc <- st_read(system.file("gpkg/nc.gpkg", package="sf"))
#> Reading layer `nc.gpkg' from data source `/Library/Frameworks/R.framework/Versions/3.5/Resources/library/sf/gpkg/nc.gpkg' using driver `GPKG'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs
ggplot() + 
  geom_sf(aes(fill = AREA), data=nc) + 
  coord_sf() +
  scale_y_continuous(
    breaks = c(34, 34.5, 35.7), # setting breaks manually works
    labels = c("A", "B", "C")   # but labels are ignored
  )

Created on 2018-08-25 by the reprex package (v0.2.0).

clauswilke commented 6 years ago

I've opened a separate issue (#2857) for the problem that labels cannot be set manually and I've added a few pointers to the key places in the code that are responsible. It's unlikely that'll be able to do more than that for #2857 within the next few weeks.

clauswilke commented 6 years ago

I have modified my PR to allow for both modes of label specification, individually or jointly. If nothing is specified, then parallels will be labeled on the left and meridians at the bottom, as is the current default. (I don't think we should change the current default.) If the NSEW specification format is used, then it overrides the default. Similarly, if the specification by axis side is used, it also overrides the default. Finally, both specifications can be used together, and the result is that all labels are shown that match either criterion.

I have called the two new parameters label_graticules and label_axes but I'm not fully committed to those names. Suggestions for alternatives are welcome.

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
library(ggplot2) 

nc <- st_read(system.file("gpkg/nc.gpkg", package="sf"))
#> Reading layer `nc.gpkg' from data source `/Library/Frameworks/R.framework/Versions/3.5/Resources/library/sf/gpkg/nc.gpkg' using driver `GPKG'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs

# default
ggplot() + geom_sf(aes(fill = AREA), data=nc) + coord_sf()


# default with rotated coordinate system
ggplot() + geom_sf(aes(fill = AREA), data=nc) + coord_sf(crs = 3338)


# NSEW specification
ggplot() + geom_sf(aes(fill = AREA), data=nc) + 
  coord_sf(label_graticules = "NE")


# axis labeling specification
ggplot() + geom_sf(aes(fill = AREA), data=nc) + 
  coord_sf(label_axes = list(top = "E", left = "N"))


# combining both specifications, rotated coordinate system
ggplot() + geom_sf(aes(fill = AREA), data=nc) + 
  coord_sf(
    crs = 3338,
    label_graticules = "N",
    label_axes = "-NE-"
  )

Created on 2018-08-25 by the reprex package (v0.2.0).

sambweber commented 6 years ago

@clauswilke Sorry for going quiet there for a few days. I installed the branch and it does exactly what's needed. The shorthand 'NE' notation is definitely handy for my purposes (embedding in another function with a 'normal' WGS84 projection) and will handle the most common uses of this argument, but I think preserving the flexibility that both specifications provide is a good thing. Thanks for incorporating quickly.

clauswilke commented 6 years ago

@sambweber I just merged the PR, so this will work in ggplot2 3.0.1.

GeoKate commented 5 years ago

Hello, I'm working with the same problem that you all seem to be solving. I'm plotting a map using ggplot geom_sf. I have my shape files in proj4string: "+proj=longlat +datum=WGS84 +no_defs" and to create a pretty map I have transformed them into a different projection (testproj1 <–sf::st_transform(sfdataframe, "+proj=eqdc +lat_0=0 +lon_0=120 +lat_1=90 +lat_2=30 +x_0=0 +y_0=00 +ellps=WGS84 +datum=WGS84 +units=m +no_defs").

Then plotted using: ggplot()+ theme_classic()+ theme(panel.background = element_rect(colour = "black", size=1, fill=NA), panel.grid.major = element_line(colour = "black", size=1), panel.grid.minor = element_line(colour = "white", size=0))+ geom_sf(data = testproj1)

The resulting map currently looks like this: screen shot 2018-11-07 at 10 59 58

I'm new to GitHub and relatively new (1year) to R. I'm hoping to get the labels on both ends of the graticules and where the longitude lines pass through the y axis that they will still be labelled with E coordinates. The same as @clauswilke commented on 25 Aug . I'm just not quite sure how to access the final code?

Many thanks.

clauswilke commented 5 years ago

It's in the 3.1.0 release. The API is discussed briefly in the blog post (https://www.tidyverse.org/articles/2018/10/ggplot2-3-1-0/) and in more detail in the documentation of coord_sf(). If you need further help, I suggest the R Studio forums or stackoverflow.com.

lock[bot] commented 5 years ago

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/