lecy / neighborhood_change_phx

https://lecy.github.io/neighborhood_change_phx/
1 stars 3 forks source link

Shapefile Merge #13

Open aehende1 opened 5 years ago

aehende1 commented 5 years ago

I've been fixing the code to correctly read in the shapefiles instead of using the st_read function we were previously using, and I've also been trying to merge the data and shapefiles into one object that can be converted to GeoJson. I've managed to successfully merge all of the data and correctly read in the shapefiles, however, when I go to fortify the shapefiles to convert them to a dataframe that can be merged with the data, I'm running into an error.

This error is occurring in Phx_data_2 R Markdown in the data folder at line 1074.

Import Shapefile

download.file( "https://github.com/lecy/neighborhood_change_phx/raw/master/shapefiles/tl_2010_04013_tract10.zip", 
"maricopa-az-census_tract10.zip" )
unzip( "maricopa-az-census_tract10.zip" )
file.remove( "maricopa-az-census_tract10.zip" )

library( rgdal )
phx <- readOGR("tl_2010_04013_tract10.shp")

phx <- spTransform( phx, CRS("+proj=longlat +datum=WGS84") )

phx$geoid2 <- paste0( "G", phx$STATEFP10, 
                     "0", phx$COUNTYFP10, "0", phx$TRACTCE10 )
nrow(phx)
916
phx <- fortify(phx, region="geoid2")
nrow(phx)
128616

Any advice?

lecy commented 5 years ago

I"m not familiar with the fortify function, so I can't comment on that. But all of the steps to join data should be in the README in the shapefiles folder:

https://github.com/lecy/neighborhood_change_phx/blob/master/shapefiles/README.md

Are those not working?

aehende1 commented 5 years ago

Those do work to merge the data, however, when I try to map without using fortify() I receive the following error:

#Combine datasets for 2010 and 2015

phx.dat <- merge(phx.dat.2010, phx.dat.2015, by.x="geoid", by.y="geoid", all.x=T, all.y=F)

#Combine ACS 5 datasets with NHGIS data

phx.dat$geoid3 <- paste0( "G", phx$STATEFP10, 
                     "0", phx$COUNTYFP10, "0", phx$TRACTCE10 )

phx.dat <- merge(NHGIS_data, phx.dat, by.x="GISJOIN", by.y="geoid3", all.x=T, all.y=F)

phx.data <- merge(phx, phx.dat, by.x="geoid2", by.y="GISJOIN", all.x=F, all.y=F)
geojson_write( phx.data, file = "../shapefiles/phx.geojson" )
ggplot() +
  geom_polygon(data = phx.data, 
               aes(x = long, y = lat, group = group, fill = phx.data$Total_2010), 
               color = "black", size = 0.25) + 
  coord_map()

Error: Aesthetics must be either length 1 or the same as the data (128616): fill

I found the fortify() step in this blog post: http://rforpublichealth.blogspot.com/2015/10/mapping-with-ggplot-create-nice.html

The two errors appear to be related because the number 128616 appears in both cases. However, I've thus far been unable to fix either error.

aehende1 commented 5 years ago

Also, fixing the previous shapefile error we discussed on Wednesday has now caused all of our previously coded static maps to no longer run. So right now I'm unable to create maps- either static or interactive.

lecy commented 5 years ago

I’m guessing it’s the data set you created, not the merge function. How many rows does your data set have before you try merging with the shapefile?

This message was sent from a mobile device.

On Apr 21, 2019, at 11:45 AM, Abby Henderson notifications@github.com wrote:

Also, fixing the previous shapefile error we discussed on Wednesday has now caused all of our previously coded static maps to no longer run. So right now I'm unable to create maps- either static or interactive.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

lecy commented 5 years ago

I recommend going through your data steps and checking df dimensions before and after merges.

This message was sent from a mobile device.

On Apr 21, 2019, at 11:45 AM, Abby Henderson notifications@github.com wrote:

Also, fixing the previous shapefile error we discussed on Wednesday has now caused all of our previously coded static maps to no longer run. So right now I'm unable to create maps- either static or interactive.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

aehende1 commented 5 years ago

I checked every step of data- they all have 916 rows (census tracts in Maricopa County) and the lengths add up correctly. I also tried the geo_join function you posted in response to one of Kendelle's issues. The geo_join function did not work for this merge.

The merge() function does successfully work, and I end up with a sp object with 916 rows and 916 columns. However, when I try to create a map of the data, I continue to get the error posted below.

#Download Shapefile

download.file( "https://github.com/lecy/neighborhood_change_phx/raw/master/shapefiles/tl_2010_04013_tract10.zip", 
"maricopa-az-census_tract10.zip" )
unzip( "maricopa-az-census_tract10.zip" )
file.remove( "maricopa-az-census_tract10.zip" )

library( rgdal )
phx <- readOGR("tl_2010_04013_tract10.shp")

phx <- spTransform( phx, CRS("+proj=longlat +datum=WGS84") )

phx$geoid2 <- paste0( "G", phx$STATEFP10, 
                     "0", phx$COUNTYFP10, "0", phx$TRACTCE10 )

#Read in dataset combining ACS 5 2010, ACS 5 2015, NHGIS variables
phx.dat <- read.csv(
"https://github.com/lecy/neighborhood_change_phx/raw/master/data/phx.dat.csv")

nrow(phx.dat)
length(phx.dat)
nrow(phx@data)
length(phx@data)

phx.data <- merge(phx, phx.dat, by.x="geoid2", by.y="GISJOIN", all.x=F, all.y=F)

nrow(phx.data)
length(phx.data)

ggplot() +
  geom_polygon(data = phx.data, 
               aes(x = long, y = lat, group = group, fill = phx.data$Total_2010), 
               color = "black", size = 0.25) + 
  coord_map()

Error: Aesthetics must be either length 1 or the same as the data (128616): fill

lecy commented 5 years ago

What is this?

phx.dat <- read.csv("phx.dat.csv")
aehende1 commented 5 years ago

That is a .csv of the merged data from ACS 5 2010, ACS 5 2015, and NHGIS for the all of the variables we pulled. I saved it as a .csv and read it in that way to make this Issue run without needing the full merge and data manipulation code, but I realized I didn't code it correctly for the issue. It should be fixed now.

lecy commented 5 years ago

Ok, here is what I could come up with. If you want to use ggplot, you need to convert the spatial object to a data frame. That is what the fortify() function is accomplishing. You can do it this way:

library( broom )
phx_fortified <- tidy( phx.data, region = "geoid2" )

library( ggplot2 )
library( mapproj )

ggplot() +
  geom_polygon( data = phx_fortified, aes( x = long, y = lat, group = group ), col="gray" ) +
  theme_void() +
  coord_map()

library( dplyr )
px <- left_join( phx_fortified, asian, by=c("id"="GISJOIN") ) 

ggplot( data = px, 
               aes( x = long, y = lat, group = group, fill = Asian_2000 ) ) + 
  geom_polygon() +
  coord_map()

Note that when you convert, you drop all of the data from the sp object and only retain the polygon info. So fortify() is only acting on the shapefile component of an sp object, not the data frame. The join above is adding the original data back into the new data frame.

> head( phx_fortified )
# A tibble: 6 x 7
   long   lat order hole  piece group            id            
  <dbl> <dbl> <int> <lgl> <fct> <chr>            <chr>         
1 -112.  33.7     1 FALSE 1     G0400130010101.1 G0400130010101
2 -112.  33.7     2 FALSE 1     G0400130010101.1 G0400130010101
3 -112.  33.7     3 FALSE 1     G0400130010101.1 G0400130010101
4 -112.  33.7     4 FALSE 1     G0400130010101.1 G0400130010101
5 -112.  33.7     5 FALSE 1     G0400130010101.1 G0400130010101
6 -112.  33.7     6 FALSE 1     G0400130010101.1 G0400130010101
> names( phx.data )
 [1] "geoid2"     "STATEFP10"  "COUNTYFP10" "TRACTCE10"  "GEOID10"    "NAME10"    
 [7] "NAMELSAD10" "MTFCC10"    "FUNCSTAT10" "ALAND10"    "AWATER10"   "INTPTLAT10"
[13] "INTPTLON10" "GEOGYEAR"   "STATE"      "STATEA"     "COUNTY"     "COUNTYA"   
[19] "TRACTA"     "Asian_1990" "Asian_2000" "Asian_2010"

Note, we were not blowing up the data frame through merges. That was the number of points needed to draw all of the polygons in the shapefile:

> nrow( phx_fortified )
[1] 128616

Also note that "geoid" in the phx.data file turns into "id" in phx_fortified.

lecy commented 5 years ago

Fortify is turning each separate polygon into a tidy data frame, i.e. one row for each point.

This:

> head( phx_fortified )
# A tibble: 6 x 7
   long   lat order hole  piece group            id            
  <dbl> <dbl> <int> <lgl> <fct> <chr>            <chr>         
1 -112.  33.7     1 FALSE 1     G0400130010101.1 G0400130010101
2 -112.  33.7     2 FALSE 1     G0400130010101.1 G0400130010101
3 -112.  33.7     3 FALSE 1     G0400130010101.1 G0400130010101
4 -112.  33.7     4 FALSE 1     G0400130010101.1 G0400130010101
5 -112.  33.7     5 FALSE 1     G0400130010101.1 G0400130010101
6 -112.  33.7     6 FALSE 1     G0400130010101.1 G0400130010101

Is built from these sp polygons:

> head( phx, 1 )
An object of class "SpatialPolygonsDataFrame"
Slot "data":
  STATEFP10 COUNTYFP10 TRACTCE10     GEOID10  NAME10           NAMELSAD10 MTFCC10 FUNCSTAT10
0        04        013    422644 04013422644 4226.44 Census Tract 4226.44   G5020          S
  ALAND10 AWATER10  INTPTLAT10   INTPTLON10         geoid2
0 7784176     4234 +33.3702427 -111.5979088 G0400130422644

Slot "polygons":
[[1]]
An object of class "Polygons"
Slot "Polygons":
[[1]]
An object of class "Polygon"
Slot "labpt":
[1] -111.59855   33.36683

Slot "area":
[1] 0.0007545534

Slot "hole":
[1] FALSE

Slot "ringDir":
[1] 1

Slot "coords":
            [,1]     [,2]
  [1,] -111.5841 33.36957
  [2,] -111.5841 33.36939
  [3,] -111.5841 33.36939
  [4,] -111.5841 33.36809
  [5,] -111.5841 33.36800
  [6,] -111.5841 33.36766
  [7,] -111.5841 33.36720
  [8,] -111.5841 33.36704
  [9,] -111.5841 33.36689
 [10,] -111.5841 33.36613
 [11,] -111.5841 33.36583
 [12,] -111.5841 33.36537
 [13,] -111.5840 33.36410
 [14,] -111.5840 33.36313
 [15,] -111.5840 33.36289
 [16,] -111.5840 33.36179
 [17,] -111.5839 33.35783
 [18,] -111.5839 33.35700
 [19,] -111.5839 33.35678
 [20,] -111.5839 33.35630
 [21,] -111.5839 33.35583
 [22,] -111.5839 33.35538
 [23,] -111.5839 33.35536
 [24,] -111.5839 33.35465
 [25,] -111.5839 33.35460
 [26,] -111.5839 33.35460
 [27,] -111.5839 33.35415
 [28,] -111.5839 33.35387
 [29,] -111.5839 33.35385
 [30,] -111.5839 33.35311
 [31,] -111.5839 33.35288
 [32,] -111.5839 33.35237
 [33,] -111.5839 33.35233
 [34,] -111.5839 33.35192
 [35,] -111.5839 33.35185
 [36,] -111.5839 33.35091
 [37,] -111.5839 33.35041
 [38,] -111.5839 33.35012
 [39,] -111.5838 33.34976
 [40,] -111.5843 33.34976
 [41,] -111.5851 33.34977
 [42,] -111.5855 33.34979
 [43,] -111.5857 33.34979
 [44,] -111.5869 33.34980
 [45,] -111.5881 33.34981
 [46,] -111.5894 33.34982
 [47,] -111.5904 33.34983
 [48,] -111.5914 33.34984
 [49,] -111.5923 33.34985
 [50,] -111.5924 33.34985
 [51,] -111.5926 33.34985
 [52,] -111.5929 33.34987
 [53,] -111.5932 33.34988
 [54,] -111.5938 33.34988
 [55,] -111.5974 33.34991
 [56,] -111.5981 33.34992
 [57,] -111.6012 33.34994
 [58,] -111.6011 33.34999
 [59,] -111.6011 33.35003
 [60,] -111.6011 33.35007
 [61,] -111.6011 33.35036
 [62,] -111.6011 33.35159
 [63,] -111.6012 33.35682
 [64,] -111.6012 33.35728
 [65,] -111.6012 33.35783
 [66,] -111.6012 33.35840
 [67,] -111.6012 33.35856
 [68,] -111.6012 33.35973
 [69,] -111.6013 33.36109
 [70,] -111.6013 33.36187
 [71,] -111.6013 33.36233
 [72,] -111.6013 33.36306
 [73,] -111.6013 33.36424
 [74,] -111.6013 33.36440
 [75,] -111.6028 33.36441
 [76,] -111.6033 33.36440
 [77,] -111.6038 33.36438
 [78,] -111.6043 33.36435
 [79,] -111.6063 33.36436
 [80,] -111.6075 33.36436
 [81,] -111.6090 33.36438
 [82,] -111.6099 33.36438
 [83,] -111.6119 33.36439
 [84,] -111.6128 33.36440
 [85,] -111.6138 33.36441
 [86,] -111.6140 33.36441
 [87,] -111.6149 33.36442
 [88,] -111.6156 33.36442
 [89,] -111.6163 33.36442
 [90,] -111.6178 33.36442
 [91,] -111.6182 33.36443
 [92,] -111.6185 33.36444
 [93,] -111.6187 33.36446
 [94,] -111.6187 33.36463
 [95,] -111.6187 33.36473
 [96,] -111.6187 33.36480
 [97,] -111.6187 33.36564
 [98,] -111.6187 33.36599
 [99,] -111.6187 33.36626
[100,] -111.6187 33.36653
[101,] -111.6187 33.36661
[102,] -111.6187 33.36702
[103,] -111.6187 33.36757
[104,] -111.6187 33.36831
[105,] -111.6187 33.36905
[106,] -111.6187 33.36920
[107,] -111.6188 33.37061
[108,] -111.6188 33.37138
[109,] -111.6188 33.37146
[110,] -111.6188 33.37150
[111,] -111.6188 33.37156
[112,] -111.6188 33.37163
[113,] -111.6189 33.37174
[114,] -111.6189 33.37200
[115,] -111.6189 33.37287
[116,] -111.6189 33.37370
[117,] -111.6189 33.37418
[118,] -111.6189 33.37505
[119,] -111.6189 33.37568
[120,] -111.6189 33.37638
[121,] -111.6189 33.37711
[122,] -111.6190 33.37817
[123,] -111.6190 33.37847
[124,] -111.6189 33.37900
[125,] -111.6187 33.37902
[126,] -111.6186 33.37902
[127,] -111.6180 33.37902
[128,] -111.6176 33.37902
[129,] -111.6157 33.37900
[130,] -111.6155 33.37901
[131,] -111.6153 33.37901
[132,] -111.6139 33.37901
[133,] -111.6110 33.37899
[134,] -111.6089 33.37898
[135,] -111.6063 33.37897
[136,] -111.6055 33.37896
[137,] -111.6051 33.37896
[138,] -111.6046 33.37896
[139,] -111.6037 33.37894
[140,] -111.6024 33.37893
[141,] -111.6019 33.37892
[142,] -111.6017 33.37892
[143,] -111.6015 33.37892
[144,] -111.6013 33.37891
[145,] -111.5980 33.37890
[146,] -111.5929 33.37887
[147,] -111.5909 33.37886
[148,] -111.5868 33.37883
[149,] -111.5852 33.37882
[150,] -111.5846 33.37880
[151,] -111.5845 33.37880
[152,] -111.5844 33.37880
[153,] -111.5843 33.37880
[154,] -111.5843 33.37865
[155,] -111.5843 33.37860
[156,] -111.5842 33.37327
[157,] -111.5842 33.37264
[158,] -111.5842 33.37255
[159,] -111.5842 33.37139
[160,] -111.5841 33.36983
[161,] -111.5841 33.36957

Slot "plotOrder":
[1] 1

Slot "labpt":
[1] -111.59855   33.36683

Slot "ID":
[1] "0"

Slot "area":
[1] 0.0007545534

Slot "plotOrder":
[1] 1

Slot "bbox":
         min        max
x -111.61898 -111.58385
y   33.34976   33.37902

Slot "proj4string":
CRS arguments:
 +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
lecy commented 5 years ago

I would create three files for GitHub:

(1) Final dataset with all census vars you need (save as CSV or RDS) (2) geojson shapefile with data (3) tidy (fortified) data frame for ggplot2 (save as CSV or RDS)

If you are creating maps in ggplot2, then you can read in the tidy version. If you are creating maps using shapefiles, you would read in the geojson.

Some discussion on adding data back after fortify():

https://stackoverflow.com/questions/22096787/how-keep-information-from-shapefile-after-fortify

aehende1 commented 5 years ago

Thank you!!! This is exactly what I was trying to find. The chloropleth was successful- now on to an interactive map and Shiny app!

lecy commented 5 years ago

Can you please update the README files to reflect these steps for posterity?

aehende1 commented 5 years ago

Yes- I'm working on updating them now.

aehende1 commented 5 years ago

Updated the README.md for the shapefiles folder and the Phx_data Rmd file

lecy commented 5 years ago

Thanks!

aehende1 commented 5 years ago

Another way to fortify that I just found is embedded within the ggplot function as so:

data <- ggplot2::fortify(phx, region = "GISJOIN")

I found this here: https://unconj.ca/blog/choropleth-maps-with-r-and-ggplot2.html