Closed walkerke closed 1 year ago
Exploring to see if I can help and running the example using the repo at https://github.com/walkerke/tidycensus/tree/estimates2021 yielded a variable POP not found error. I see the above references to variables like POP_2021
in https://api.census.gov/data/2021/pep/population/variables.html.
state_pop <-
get_estimates(
geography = "state",
year = 2021,
product = "population",
)
Error: Your API call has errors. The API message returned is error: error: unknown variable 'POP'.
@RickPack thanks for taking a look! Did you install with remotes::install_github("walkerke/tidycensus@estimates2021")
? That code is working on my end when using tidycensus from that branch.
You're correct about the variable names - right now this should work:
library(tidycensus)
get_estimates(
geography = "state",
year = 2021,
variables = "POP_2021"
)
# A tibble: 52 × 4
NAME GEOID variable value
<chr> <chr> <chr> <dbl>
1 Oklahoma 40 POP_2021 3986639
2 Nebraska 31 POP_2021 1963692
3 Hawaii 15 POP_2021 1441553
4 South Dakota 46 POP_2021 895376
5 Tennessee 47 POP_2021 6975218
6 Nevada 32 POP_2021 3143991
7 New Mexico 35 POP_2021 2115877
8 Iowa 19 POP_2021 3193079
9 Kansas 20 POP_2021 2934582
10 District of Columbia 11 POP_2021 670050
# … with 42 more rows
Great, that worked. Thank you!
I like the way you are leaning. However, I do not know if there would be any negative consequences of returning all the variables when product = "population"
is used. I agree that stripping the suffix makes sense given the potential Census API return to not using the suffix.
I would like to poke this, I am attempting to grab data for 2021 vintage year with the following:
NH_population_2021 <- get_estimates(geography = "county", product = "characteristics", breakdown = c("AGEGROUP", "SEX", "HISP", "RACE"), breakdown_labels = TRUE, year = 2021, time_series = TRUE, state = 33, output = "tidy", show_call = TRUE)
This is returning an error, however when I go to the FTP I can find the file here:
https://www2.census.gov/programs-surveys/popest/datasets/2020-2021/counties/asrh/
cc-est2021-alldata-33.csv
Is there any information on the update and when the API call might be updated to support this? current error message given is
"Error: At this time, the only available geographies for 2020 and 2021 population estimates are 'us', 'region', 'division', and 'state'."
if you are looking for assistance with this I can help with a pull request and edits to the code?
I've contacted Census about this and they've told me that they will put the 2021 estimates on the API, but there is no timeline for completing that project. So I'm kind of in a holding pattern at the moment. I have mulled over pulling the flat files and cleaning them up/returning them but that may be overkill if the data will end up on the API eventually.
I'd absolutely look at a pull request to add some functionality here!
Just got confirmation on the Census Slack that the PEP is unlikely to be added back to the API. As such I'm going to put some work in to parse the flat files and update get_estimates()
so we can continue to use PEP data in tidycensus.
I had created some custom code that would allow you to effectively navigate to the URL to download the flat file directly into R, maybe I can see if I can dig it up and add a pull request to add some of this functionality.
@tabauer23 that would be great! the key for implementation is that the output mirrors what get_estimates()
was delivering for previous years so that the documentation is consistent. We can iterate on that in a branch.
I have an implementation of this merged to master. The one issue right now: we don't yet have cartographic boundary shapefiles for 2022, which will have the new county definitions for Connecticut. So I'm using 2021 geographies internally but that doesn't work for Connecticut, so I need to fix & keep an eye on that. So I'll hold off on submitting to CRAN until that is resolved.
I am working on implementing a way to use the get_estimates() to figure out the year being >2019 then pulling the flat file in by the FIPS code. I will push some work to a branch for you to check out soon, see if this is worth the extra effort, I did something like this before.
I needed this for a project so I went ahead and implemented it; I'm going to update the tidycensus documentation then send along to CRAN.
I'll build in full support as the new data files are released for 2022; right now the product
argument is not available (nor are breakdowns) but when that data comes out later this year, I'll build in that support again.
@mfherman @szimmer or anyone else who has time to put eyes on it -
Census released the new 2021 Population Estimates yesterday for states (and larger geographies). The only "product" they've released is the Population product, which includes mid-year population and population density as in previous years as well as a bunch of new change-over-time variables.
The wrinkle is that due to a variety of reasons (the pandemic most significantly) Census did not release 2020 estimates last year. Instead, they've bundled those estimates in with the 2021 PEP and added a year suffix to distinguish 2020 and 2021 figures. This will not work with the CRAN version of tidycensus as we check for valid variables (see https://github.com/walkerke/tidycensus/blob/master/R/load_data.R#L543-L550).
I'm working on a fix for this at https://github.com/walkerke/tidycensus/tree/estimates2021. It's pretty crude at the moment but works like:
output = "wide"
works as expected as well. In this approach,product = "population"
only returns mid-year population and population density as it does in previous years. I've relaxed the variable restrictions for 2020 and 2021 to allow users to request any of the other change-rate variables individually.There are two issues with this approach:
product = "population"
typically returns only mid-year population and density. The new estimates have more variables available. Shouldproduct = "population"
return all variables, or maintain consistency with previous years?My lean is to implement the first point by bundling all the available variables in the product (which I don't yet do), and for the second point parsing the year argument when supplied and cleaning up the result to maintain consistency with previous years. It would look like this:
Internally,
POP_2021
is inferred from theyear
argument and fetched from the API. Users who supplyPOP_2021
directly would not get an error message but we'd clean up the returned result.Any comments are welcome - I'd rather take time to get this right rather than rush it out (though anyone is welcome to install the from the estimates2021 branch and grab the estimates right away!).