hrecht / censusapi

R package to retrieve U.S. Census data and metadata via API
https://www.hrecht.com/censusapi/
169 stars 30 forks source link

Bug report: Issue with Congressional District Call #66

Closed sharifamlani closed 4 years ago

sharifamlani commented 4 years ago

Describe the bug When using the following code to call data from the American Communities survey

library(censusapi)
CD_Pure <- getCensus(name = "acs/acs1/spp",
          vars = sahie_vars$name[1], 
          region = "congressional%20district:*",
          vintage = 2016, 
          key = Census_API_Key)`

The API call is a follows:

The package's API call was:

https://api.census.gov/data/2016/acs/acs1/spp?key=Census_API_Key&get=S0201PR_0093E&for=congressional%2520district%3A%2A

Notice that although I am calling region = "congressional%20district:*" in the region arguement in the functions, the for call on the API's request is congressional%2520district.

This bug produces the following error: Error in apiCheck(req) : The Census Bureau returned the following error message: error: invalid 'for' argument

To Reproduce To compare, when I call directly to the API using the same parameters:

library(httr)

API_URL <- paste("https://api.census.gov/data/2016/acs/acs1/spp?get=", sahie_vars$name[1], ",S0201_0123E&for=congressional%20district:*&key=", Census_API_Key, sep = "")

x <- httr::GET(API_URL)

I am successfully able to communicate with the API

Date: 2020-06-12 22:15 Status: 200 Content-Type: application/json;charset=utf-8 Size: 10.6 kB [["S0201PR_0093E","S0201_0123E","state","congressional district"], [null,"3.9","01","01"], [null,"7.4","01","02"], [null,"7.1","01","03"], [null,"4.2","01","04"], [null,"6.4","01","05"], [null,"4.9","01","06"], [null,"5.1","01","07"], [null,"7.2","02","00"], [null,"8.1","04","01"],

Expected behavior The issue in the function is that when region is set to congressional districts ( congressional%20district:*) in the function, the API call includes the for argument congressional%2520district. Instead, the API call should only include congressional%20district. `

R session information:

Additional context Note, I use congressional%20district because that is what is given in the example API call on the Census website.

Thanks so much for making the package! I really appreciate it.

sharifamlani commented 4 years ago

Potential Solution

# Return API's built in error message if invalid call
apiCheck <- function(req) {
  if (req$status_code==400) {
    error_message <- (gsub("<[^>]*>", "", httr::content(req, as="text")))
    if (error_message == "error: missing 'for' argument") {
      stop("This dataset requires you to specify a geography with the 'region' argument.")
    }
    stop(paste("The Census Bureau returned the following error message:\n", error_message,
               "\n Your API call was: ", print(req$url)))
  }
  # Some time series don't give error messages, just don't resolve (e.g. SAIPE)
  if (req$status_code==204) stop("204, no content was returned.\nSee ?listCensusMetadata to learn more about valid API options.", call. = FALSE)
  if (identical(httr::content(req, as = "text"), "")) stop(paste("No output to parse. \n Your API call was: ", print(req$url)), call. = FALSE)
}

apiParse <- function (req) {
  if (jsonlite::validate(httr::content(req, as="text"))[1] == FALSE) {
    error_message <- (gsub("<[^>]*>", "", httr::content(req, as="text")))
    stop(paste("The Census Bureau returned the following error message:\n", error_message, "\nYour api call was: ", req$url))
  } else {
    raw <- jsonlite::fromJSON(httr::content(req, as = "text"))
  }
}

# Function to clean up column names - particularly ones with periods in them
cleanColnames <- function(dt) {
  # No trailing punct
  colnames(dt) <- gsub("\\.[[:punct:]]*$", "", colnames(dt))
  # All punctuation becomes underscore
  colnames(dt) <- gsub("[[:punct:]]", "_", colnames(dt))
  # Get rid of repeat underscores
  colnames(dt) <- gsub("(_)\\1+", "\\1", colnames(dt))
  return(dt)
}

responseFormat <- function(raw) {
  # Make first row the header
  colnames(raw) <- raw[1, ]
  df <- data.frame(raw)
  df <- df[-1,]
  df <- cleanColnames(df)
  # Make all columns character
  df[] <- lapply(df, as.character)
  # Make columns numeric if they have numbers in the column name - note some APIs use string var names
  # For ACS data, do not make columns numeric if they are ACS annotation variables - ending in MA or EA or SS
  # Do not make label variables (ending in _TTL) numeric
  value_cols <- grep("[0-9]", names(df), value=TRUE)
  error_cols <- grep("MA|EA|SS|_TTL|_NAME|NAICS2012|NAICS2012_TTL|fage4|FAGE4", value_cols, value=TRUE, ignore.case = T)
  for(col in setdiff(value_cols, error_cols)) df[,col] <- as.numeric(df[,col])

  row.names(df) <- NULL
  return(df)
}

################ Here is an updated getCensus2 code that worked #################
#Note: I have tested it on American Communities Survey and County Business Patterns for Congressional Districts. Updated and improvements welcome.

getCensus2 <- function(name, vars, region, vintage, key = Sys.getenv("CENSUS_KEY")){

  vars1 <- paste(c(vars), collapse = ",")

  API_URL <- paste("https://api.census.gov/data/", vintage, "/", name, "?get=", vars1, "&for=", region, "&key=", key, sep = "")
  x <- httr::GET(API_URL)
  # Check the API call for a valid response
  apiCheck(x)

  # If check didn't fail, parse the content
  raw <- apiParse(x)

  # Format the response into a nice data frame
  df <- responseFormat(raw)

  return(df)

}
hrecht commented 4 years ago

Hi there, change the %20 in your call to a space. The package takes care of URL encoding as needed. This works for me.

CD_Pure <- getCensus(name = "acs/acs1/spp",
    vars = "S0201PR_0093E", 
    region = "congressional district:*",
    vintage = 2016)

In the future you can see the geographies available using geos <- listCensusMetadata(name = "acs/acs1/spp", vintage = 2016, type = "geographies") Use the name exactly as it's written in the name column in the response. Hope that helps!

sharifamlani commented 4 years ago

Ahh great! Thank you so much for the response. Yes, the code you provided works perfectly for me. I appreciate all your help and the helpful tip as well!