walkerke / tidycensus

Load US Census boundary and attribute data as 'tidyverse' and 'sf'-ready data frames in R
https://walker-data.com/tidycensus
Other
639 stars 99 forks source link

Possible rounding issue in tidycensus micro data in the wage and earning variables. #593

Open markbauby opened 1 day ago

markbauby commented 1 day ago

Ive noticed a possible issue in regard to microdata through tidycensus and the wage data.

Long story short I have been working on micro data with IPUMSR and tidycensus packages and while I can get close to similiar results it looks like some of the variables within tidycensus are rounded. Specifically the "WAGP" and "PERNP" variables. While their equilavents in IPMUSR ("INCWAGE" and "INCEARN") are not.

Is this a bug/error in tidycensus or is it from user error on my part?

My code is below.

IPUMSR segment

ipums_extract_test <- define_extract_micro( collection = "usa", description = "USA extract for API vignette", samples = c("us2022c"), variables = c("AGE", "STATEFIP", "EMPSTAT", "INCWAGE", "INCEARN", "us2022c_schl"))

ipums_data <- ipums_extract_test %>% submit_extract() %>% wait_for_extract() %>% download_extract() %>% read_ipums_micro()

ipums_test <- ipums_data %>% filter(STATEFIP == 26 & AGE >= 16 & EMPSTAT == 1 & US2022C_SCHL %in% 1:21)

tidycensus segment

tidy_test <- get_pums( year = 2022, survey = "acs5", state = "MI", variables = c("AGEP", "ESR", "WAGP", "PERNP", "SCHL") ) %>% filter(AGEP >= 16 & (ESR == 1 | ESR == 2 | ESR == 4 |ESR == 5) & SCHL %in% 1:21)

Thank you.

walkerke commented 8 hours ago

Interesting. We don't do any post-processing of the PUMS data like that in tidycensus, so whatever you're seeing is what's coming through the Census API.

You can take a look here: https://api.census.gov/data/2022/acs/acs5/pums?get=SERIALNO%2CSPORDER%2CWGTP%2CPWGTP%2CAGEP%2CESR%2CWAGP%2CPERNP%2CSCHL&ucgid=0400000US26

Perhaps the IPUMS team does some post-processing of the data?