walkerke / tidycensus

Load US Census boundary and attribute data as 'tidyverse' and 'sf'-ready data frames in R
https://walker-data.com/tidycensus
Other
639 stars 100 forks source link

Incorrect Margins of Error? #190

Closed arthurgailes closed 5 years ago

arthurgailes commented 5 years ago

Hello,

The get_acs function appears to be drawing incorrect margins of error for many population variables. I can verify this exists at least the tract and block group level. Compared to both factfinder and data.census.gov on several variables. Example below:

test <- tidycensus::get_acs('tract', year = 2017, variables = 'B17020_002', 
   state = '06', county = '001', output = 'wide', moe_level = 95)
test <- dplyr::arrange(test, GEOID)
head(test)

A tibble: 6 x 4

GEOID NAME B17020_002E B17020_002M

1 06001400100 Census Tract 4001, Alameda County, California 126 95.3 2 06001400200 Census Tract 4002, Alameda County, California 109 53.6 3 06001400300 Census Tract 4003, Alameda County, California 382 194. 4 06001400400 Census Tract 4004, Alameda County, California 368 261. 5 06001400500 Census Tract 4005, Alameda County, California 370 260. 6 06001400600 Census Tract 4006, Alameda County, California 179 108.

Compare to second row of: https://factfinder.census.gov/bkmk/table/1.0/en/ACS/17_5YR/B17020/0400000US06.14000|0500000US06001.15000

Maybe an error in the census API?

Thanks as always for all your hard work.

walkerke commented 5 years ago

Thanks for filing the issue - this will make for a good documentation example. You are getting expected results because you have set moe_level = 95. The default MOE level in the ACS - which is what you see in FactFinder - is 90, and this is set as the default in tidycensus. Setting it to 95 performs an internal calculation that increases the MOE. So when you modify your code as follows:

test <- tidycensus::get_acs('tract', year = 2017, variables = 'B17020_002', 
                            state = '06', county = '001', output = 'wide')

test <- dplyr::arrange(test, GEOID)
head(test)

The expected results are returned.

# A tibble: 6 x 4
  GEOID       NAME                                          B17020_002E B17020_002M
  <chr>       <chr>                                               <dbl>       <dbl>
1 06001400100 Census Tract 4001, Alameda County, California         126          80
2 06001400200 Census Tract 4002, Alameda County, California         109          45
3 06001400300 Census Tract 4003, Alameda County, California         382         163
4 06001400400 Census Tract 4004, Alameda County, California         368         219
5 06001400500 Census Tract 4005, Alameda County, California         370         218
6 06001400600 Census Tract 4006, Alameda County, California         179          91
arthurgailes commented 5 years ago

facepalm Apologies for the obvious question, and thanks for the quick response.