walkerke / tidycensus

Load US Census boundary and attribute data as 'tidyverse' and 'sf'-ready data frames in R
https://walker-data.com/tidycensus
Other
640 stars 97 forks source link

Feature request to make "Margin of Error" optional in get_acs(). #566

Closed ar-puuk closed 4 months ago

ar-puuk commented 4 months ago

First of all, huge thanks to this package and the contributors. This has made my life so easy in terms of downloading the census and the acs geospatial data.

Description

The following is my workflow when I work with ACS data. While I understand the importance of the Margin of Error columns, I have never had to work with it. So, I end up removing them directly. I wonder if it is possible to modify the get_acs() function to not fetch or remove the Margin of Error columns.

# MEANS OF TRANSPORTATION TO WORK
data <- get_acs(
  year = 2022,
  survey = "acs5",
  geography = "block group",
  state = "CO",
  county = c("Denver", "Jefferson"),
  variables = c(
    TOTPOP_CEN = "B02001_001", # Total population
    HU_CEN = "B25001_001", # housing units
    HH_CEN = "B11012_001", # Total Households
    GQPOP_CEN = "B09019_026" # Group Population
  ),
  output = "wide",
  geometry = TRUE
) %>% st_transform(3857)

data_clean <- data %>%
  # remove the margin of error columns from the column names
  select(-matches("M$")) %>%
  # remove E at the end of all Estimate columns (columns ending in E except for NAME column)
  rename_at(vars(-matches("^NAME$")), ~sub("E$", "", .))

Requested Feature

Option 1

If the function could be updated such that it takes variable names followed by 'E' or 'M', it would only download the Estimates or Margin of Errors respectively. If the variable names do not contain 'E' or 'M' then both Estimate and Margin of Error columns would be downloaded as it is right now.

# MEANS OF TRANSPORTATION TO WORK
data <- get_acs(
  year = 2022,
  survey = "acs5",
  geography = "block group",
  state = "CO",
  county = c("Denver", "Jefferson"),
  variables = c(
    TOTPOP_CEN = "B02001_001E", # Total population
    HU_CEN = "B25001_001E", # housing units
    HH_CEN = "B11012_001E", # Total Households
    GQPOP_CEN = "B09019_026E" # Group Population
  ),
  output = "wide",
  geometry = TRUE
) %>% st_transform(3857)

Option 2

Add a fetch_moe argument within the get_acs() function, which if TRUE will download the variables with 'E' and 'M' suffixed after the variable names for Estimate and Margin of Error respectively. If the argument is set to FALSE then fetch just the estimates and don't add a suffix to the variable name.

# MEANS OF TRANSPORTATION TO WORK
data <- get_acs(
  year = 2022,
  survey = "acs5",
  geography = "block group",
  state = "CO",
  county = c("Denver", "Jefferson"),
  variables = c(
    TOTPOP_CEN = "B02001_001", # Total population
    HU_CEN = "B25001_001", # housing units
    HH_CEN = "B11012_001", # Total Households
    GQPOP_CEN = "B09019_026" # Group Population
  ),
  output = "wide",
  fetch_moe = FALSE,
  geometry = TRUE
) %>% st_transform(3857)

I understand that it is just two extra lines I need to add every time I have to use get_acs(), but I think it would be helpful to have this feature bundled with the function.

walkerke commented 4 months ago

Thank you for your kind words, and your suggestion! This is not something I plan to implement. It is a core part of the design philosophy of tidycensus that margins of error should always be returned alongside their estimates, as the uncertainty around those estimates is important for users to understand.