Unexpected Ordering with fortify.SpatialPolygonsDataFrame

nnnagle commented 12 years ago

I have a SpatialPolygonsDataFrame in which the FIDs have been set to a specific column in the data.frame.

fortify.SpatialPolygonsDataFrame is then not giving the result I expected; it seems to be reordering the features and giving them the wrong label; the 'id' column of the resulting ggplot data.frame doesn't match the region id of the original sp object. The reordering seems to happen in the split() or invert() commands of fortify.SpatialPolygonsDataFrame().

I have attached reproducible code, with the expected spplot() results, the unexpected results from fortify.SpatialPolygonsDataFrame(), and how I fixed it by rewriting the fortify to be more like fortify.SpatialPolygons(), and using the Feature IDs to name the ids and merge the data.frame attributes.

It is entirely possible that I am using the region parameter of fortify() improperly, I don't know. Does anyone know what is going wrong with fortify.SpatialPolygonsDataFrame() here, or with how I am using it?

Nicholas

Nicholas Nagle, Dept. of Geography University of Tennessee 307 Burchfiel Geography Bldg. Knoxville, TN 37996

#################################################################
download.file('http://graunt.bus.utk.edu/Files/boston_shp.Rdata','boston_shp.Rdata')
load('boston_shp.Rdata')
# boston_shp: the Harrison-Rubinfeld boston data, attached to the 1970 tract boundaries
# from the National Historical GIS <http://www.nhgis.org>

# The Feature IDs are also the 'TRACT' columns

# Desired result...
spplot(boston_shp,'dis')

####################
# Does not work as expected
boston.ggdf <- fortify(boston_shp,region='TRACT')
# Merge the ggplot data frame with the attributes attributes, matching
#   boston.ggdf$id to row.names(boston_shp@data)

boston.ggdf <- cbind(boston.ggdf, boston_shp@data[as.character(boston.ggdf$id),])

boston.ggplot <- ggplot(boston.ggdf, aes(x=long, y=lat, group=group, fill=dis))
boston.ggplot+geom_polygon()+coord_equal()

########################################
# This does work as expected
saferFortify.SPDF <- function(model, data, region=NULL){
 warning('Using FIDs as the id.  User should verify that Feature IDs are also the row.names of data.frame. See spChFIDs().')
 attr <- as.data.frame(model)
 coords <- ldply(model@polygons,fortify)
 coords <- cbind(coords,attr[as.character(coords$id),])
}

boston.ggdf <- saferFortify.SPDF(boston_shp)
boston.ggplot <- ggplot(boston.ggdf, aes(x=long, y=lat, group=group, fill=dis))
boston.ggplot+geom_polygon()+coord_equal()

wch commented 12 years ago

It'll be helpful if you edit your comment and mark the block of code by adding lines with three backticks, like this:

text here

code is here x <- 1:3


text here

Update: I edited the original post to have these code block markers

hadley commented 12 years ago

Some steps towards understanding the problem


map <- fortify(boston_shp, region='TRACT')

data <- as.data.frame(boston_shp)
data$id <- data$TRACT
qplot(as.numeric(TRACT), dis, data = data)

both <- join(map, data, by = "id")
qplot(as.numeric(TRACT), dis, data = both)

ggplot(both, aes(x=long, y=lat, group=group, fill=dis)) + 
  geom_polygon() + 
  coord_equal()

hadley commented 12 years ago


# I really don't understand what's going on - the match between
# TRACT and dis seems ok
head(arrange(unique(both[c("id", "dis")]), id))
head(arrange(data, id)[c("id", "dis")])

tail(arrange(unique(both[c("id", "dis")]), id))
tail(arrange(data, id)[c("id", "dis")])

# And it's not a polygon drawing problem because points look fine
ggplot(both, aes(x=long, y=lat, group=group, colour = dis)) + 
  geom_point()

hadley commented 12 years ago

# This works
saferFortify.SPDF <- function(model, data, region=NULL){
 warning('Using FIDs as the id.  User should verify that Feature IDs are also the row.names of data.frame. See spChFIDs().')
 attr <- as.data.frame(model)
 coords <- ldply(model@polygons,fortify)
 coords <- cbind(coords,attr[as.character(coords$id),])
}

both2 <- saferFortify.SPDF(boston_shp)
# But the mapping between id and dis isn't any different?!
tail(arrange(unique(both2[c("id", "dis")]), id))
head(arrange(unique(both2[c("id", "dis")]), id))

ggplot(both2, aes(x=long, y=lat, group=group, colour = dis)) + 
  geom_point()

hadley commented 12 years ago

Charlotte says it's something to do with the difference in ordering between split and invert.

cwickham commented 12 years ago

A small test case:

library(ggplot2)
library(sp)

make_square <- function(x = 0, y = 0, height = 1, width = 1){
  delx <- width/2
  dely <- height/2
  Polygon(matrix(c(x + delx, x - delx,x - delx,x + delx,x + delx , 
      y - dely,y - dely,y + dely,y + dely,y - dely), ncol = 2))  
}

make_hole <- function(x = 0, y = 0, height = .5, width = .5){
  p <- make_square(x = x, y = y, height = height, width = width)
  p@hole <- TRUE
  p
}

fake_data <- data.frame(ids = 1:5, region = c(1,1,2,3,4))
rownames(fake_data) <- 1:5
polys <- list(Polygons(list(make_square(), make_hole()), 1), 
              Polygons(list(make_square(1,0), make_square(2, 0)), 2),
              Polygons(list(make_square(1,1)), 3),
              Polygons(list(make_square(0,1)), 4),
              Polygons(list(make_square(0,3)), 5))

polys_sp <- SpatialPolygons(polys)
fake_sp <- SpatialPolygonsDataFrame(polys_sp, fake_data)

fake_gg <- fortify(fake_sp)
qplot(long, lat, data = fake_gg, geom = "polygon", fill = id)

# works
fake_gg_r <- fortify(fake_sp, region = "region")
qplot(long, lat, data = fake_gg_r, geom = "polygon", fill = id)

# now reorder regions
polys2 <- rev(polys)
polys2_sp <- SpatialPolygons(polys2)
fake_sp2 <- SpatialPolygonsDataFrame(polys2_sp, fake_data)

# doesn't work!
fake_gg2 <- fortify(fake_sp2)
qplot(long, lat, data = fake_gg2, geom = "polygon", fill = id)

# doesn't work!
fake_gg_r2 <- fortify(fake_sp2, region = "region")
qplot(long, lat, data = fake_gg_r2, geom = "polygon", fill = id)

cwickham commented 12 years ago

This fixes the problem of ordering, and changes the default behavior to splitting each Polygons element into it's own region (rather than using the first column to define region), following Nicholas' code.

fortify.SpatialPolygonsDataFrame <- function(model, data, region = NULL, ...) {
  attr <- as.data.frame(model)
  # If not specified, split into regions based on polygons
  if (is.null(region)) {
    coords <- ldply(model@polygons,fortify)
    message("Regions defined for each Polygons")
  } else {
    cp <- polygons(model)
    try_require("maptools")

    # Union together all polygons that make up a region
    unioned <- unionSpatialPolygons(cp, attr[, region])
    coords <- fortify(unioned)
    coords$order <- 1:nrow(coords)
  }
  coords
}

hadley commented 12 years ago

And a test:

expect_equivalent(fortify(fake_sp), arrange(fortify(fake_sp2), id, order))

wch commented 12 years ago

I made a branch with the fix, and merged in 9f119467da965ba723196f347821925623d63972.

tidyverse / ggplot2

Unexpected Ordering with fortify.SpatialPolygonsDataFrame #434