r-spatial / sf

Simple Features for R
https://r-spatial.github.io/sf/
Other
1.34k stars 298 forks source link

Polygon errors in group_by / summarise tracts to counties and *then* across years #2093

Closed seandalby closed 1 year ago

seandalby commented 1 year ago

Hi - my code below requires the "tidycensus" package (and a us census api key) to load the data but should be reproducible after that with sf etc. -

I'm looking at income across years from ACS 5-year estimates and pull in the data as follows:


for (y in c(2009:2021)){

  # ACS Extraction for tracts:
  pa_acs_tr_y <- get_acs(geography = "tract", variables = "B19013_001E",
                  state = "PA", geometry = TRUE, year = y) %>%
    rename(
      med_hh_inc = estimate,
      med_hh_inc_moe = moe
    ) %>%
    dplyr::select(GEOID, med_hh_inc, med_hh_inc_moe, geometry) %>%
    mutate(
      year = y,
      county = gsub("42([0-9][0-9][0-9]).*", "\\1", GEOID),
      tract = gsub("42[0-9][0-9][0-9](.*)", "\\1", GEOID),
      )

  if(y == 2009){pa_acs_tr <- pa_acs_tr_y}
  else{pa_acs_tr <- rbind(pa_acs_tr, pa_acs_tr_y)}
}

# Getting rid of empty tracts (1 of them for each year):
pa_acs_tr <- pa_acs_tr[!st_is_empty(pa_acs_tr),]

From here, the code below works:

pa_acs_tr_space <- pa_acs_tr %>% group_by(year, county) %>%
  summarise(med_inc = mean(med_hh_inc, na.rm = TRUE)) 

But this code doesn't:

pa_acs_tr_space <- pa_acs_tr %>% group_by(county) %>%
  summarise(med_inc = mean(med_hh_inc, na.rm = TRUE)) 

I can work around this by grouping just on the data.frame and then joining to a county-level sf object later. I'm thinking I might be asking group_by / summarise to do too much, but is there a good reason why it won't "aggregate" across years when the same geometry is being used?

Thanks!

edzer commented 1 year ago

Thanks, that is something to look into why it goes wrong:

> pa_acs_tr_space <- pa_acs_tr %>% group_by(county) %>%
  summarise(med_inc = mean(med_hh_inc, na.rm = TRUE)) 
Error:
! Assigned data `geom` must be compatible with existing data.
✖ Existing data has 67 rows.
✖ Assigned data has 119 rows.
ℹ Only vectors of size 1 are recycled.
Run `rlang::last_error()` to see where the error occurred.

alternative to summarise you can use

aggregate(pa_acs_tr["med_hh_inc"], list(county = pa_acs_tr$county), mean, na.rm = TRUE)
edzer commented 1 year ago

Setting sf_use_s2(FALSE) solves your issue with summarise(). #1771

seandalby commented 1 year ago

Amazing, thanks so much @edzer