Closed nnnagle closed 12 years ago
It'll be helpful if you edit your comment and mark the block of code by adding lines with three backticks, like this:
text here
code is here x <- 1:3
text here
Update: I edited the original post to have these code block markers
Some steps towards understanding the problem
map <- fortify(boston_shp, region='TRACT')
data <- as.data.frame(boston_shp)
data$id <- data$TRACT
qplot(as.numeric(TRACT), dis, data = data)
both <- join(map, data, by = "id")
qplot(as.numeric(TRACT), dis, data = both)
ggplot(both, aes(x=long, y=lat, group=group, fill=dis)) +
geom_polygon() +
coord_equal()
# I really don't understand what's going on - the match between
# TRACT and dis seems ok
head(arrange(unique(both[c("id", "dis")]), id))
head(arrange(data, id)[c("id", "dis")])
tail(arrange(unique(both[c("id", "dis")]), id))
tail(arrange(data, id)[c("id", "dis")])
# And it's not a polygon drawing problem because points look fine
ggplot(both, aes(x=long, y=lat, group=group, colour = dis)) +
geom_point()
# This works
saferFortify.SPDF <- function(model, data, region=NULL){
warning('Using FIDs as the id. User should verify that Feature IDs are also the row.names of data.frame. See spChFIDs().')
attr <- as.data.frame(model)
coords <- ldply(model@polygons,fortify)
coords <- cbind(coords,attr[as.character(coords$id),])
}
both2 <- saferFortify.SPDF(boston_shp)
# But the mapping between id and dis isn't any different?!
tail(arrange(unique(both2[c("id", "dis")]), id))
head(arrange(unique(both2[c("id", "dis")]), id))
ggplot(both2, aes(x=long, y=lat, group=group, colour = dis)) +
geom_point()
Charlotte says it's something to do with the difference in ordering between split and invert.
A small test case:
library(ggplot2)
library(sp)
make_square <- function(x = 0, y = 0, height = 1, width = 1){
delx <- width/2
dely <- height/2
Polygon(matrix(c(x + delx, x - delx,x - delx,x + delx,x + delx ,
y - dely,y - dely,y + dely,y + dely,y - dely), ncol = 2))
}
make_hole <- function(x = 0, y = 0, height = .5, width = .5){
p <- make_square(x = x, y = y, height = height, width = width)
p@hole <- TRUE
p
}
fake_data <- data.frame(ids = 1:5, region = c(1,1,2,3,4))
rownames(fake_data) <- 1:5
polys <- list(Polygons(list(make_square(), make_hole()), 1),
Polygons(list(make_square(1,0), make_square(2, 0)), 2),
Polygons(list(make_square(1,1)), 3),
Polygons(list(make_square(0,1)), 4),
Polygons(list(make_square(0,3)), 5))
polys_sp <- SpatialPolygons(polys)
fake_sp <- SpatialPolygonsDataFrame(polys_sp, fake_data)
fake_gg <- fortify(fake_sp)
qplot(long, lat, data = fake_gg, geom = "polygon", fill = id)
# works
fake_gg_r <- fortify(fake_sp, region = "region")
qplot(long, lat, data = fake_gg_r, geom = "polygon", fill = id)
# now reorder regions
polys2 <- rev(polys)
polys2_sp <- SpatialPolygons(polys2)
fake_sp2 <- SpatialPolygonsDataFrame(polys2_sp, fake_data)
# doesn't work!
fake_gg2 <- fortify(fake_sp2)
qplot(long, lat, data = fake_gg2, geom = "polygon", fill = id)
# doesn't work!
fake_gg_r2 <- fortify(fake_sp2, region = "region")
qplot(long, lat, data = fake_gg_r2, geom = "polygon", fill = id)
This fixes the problem of ordering, and changes the default behavior to splitting each Polygons element into it's own region (rather than using the first column to define region), following Nicholas' code.
fortify.SpatialPolygonsDataFrame <- function(model, data, region = NULL, ...) {
attr <- as.data.frame(model)
# If not specified, split into regions based on polygons
if (is.null(region)) {
coords <- ldply(model@polygons,fortify)
message("Regions defined for each Polygons")
} else {
cp <- polygons(model)
try_require("maptools")
# Union together all polygons that make up a region
unioned <- unionSpatialPolygons(cp, attr[, region])
coords <- fortify(unioned)
coords$order <- 1:nrow(coords)
}
coords
}
And a test:
expect_equivalent(fortify(fake_sp), arrange(fortify(fake_sp2), id, order))
I made a branch with the fix, and merged in 9f119467da965ba723196f347821925623d63972.
I have a SpatialPolygonsDataFrame in which the FIDs have been set to a specific column in the data.frame.
fortify.SpatialPolygonsDataFrame is then not giving the result I expected; it seems to be reordering the features and giving them the wrong label; the 'id' column of the resulting ggplot data.frame doesn't match the region id of the original sp object. The reordering seems to happen in the split() or invert() commands of fortify.SpatialPolygonsDataFrame().
I have attached reproducible code, with the expected spplot() results, the unexpected results from fortify.SpatialPolygonsDataFrame(), and how I fixed it by rewriting the fortify to be more like fortify.SpatialPolygons(), and using the Feature IDs to name the ids and merge the data.frame attributes.
It is entirely possible that I am using the region parameter of fortify() improperly, I don't know. Does anyone know what is going wrong with fortify.SpatialPolygonsDataFrame() here, or with how I am using it?
Nicholas
Nicholas Nagle, Dept. of Geography University of Tennessee 307 Burchfiel Geography Bldg. Knoxville, TN 37996