This repository represents a set of tools for crowd-sourcing generation of secondary measures constructed from the American Community Survey (ACS).
The motivation for this project is to develop a means to increase the accessibility of constructions of American Community Survey (ACS) data to both technical and non-technical users. The ACS has been conducted annually since 2005 to obtain information about individuals, housholds, and properties for communities across the United States on topics ranging from basic demographics, to education, employment, income, poverty status, marriage, family structure, transportation, national origin, take-up of public programs, property value, building age and so on.
The Integrated Public Use Microdata Series (IPUMS) is a project of the Minnesota Population Center which offers an interface that allows users to shop for deidentified individual-level data from the annual ACS data samples. While these microdata sets are useful for skilled data researchers focused on constructing specific measures from data patterns, the most typical ACS resource of interest is the aggregate tables, which represent topic-specific counts, data summaries, and cross-tabs of measures for specified geographies.
Despite the helpfulness of these aggregations, the diversity of information in the ACS and potential topics of interest means that there are literally hundreds of tables, each of which containing potentially dozens of pieces of information. For example, table B17020 represents information on "Poverty Status in the Past 12 Months By Age" and has 17 columns representing the count of individuals who had valid poverty status information, and counts of individuals who fell into a particular poverty category (below vs above the poverty line) and age category (under 6, 6-11, 12-17, 18-59, 60-74, 75-84, over 85).
There is a range of means for accessing ACS aggregate data tables:
acs
package in R which allows R programmers to use the Census API to generate custom pulls of data.What is interesting--and surprising--about all of these means of accessing ACS data is that they provide data that still likely needs basic processing to be useful. Users interested in rates of child poverty could use any of the above resources to pull the contents of table B17020 to get the 17 columns representing counts of individuals by poverty and age category, but none will offer a rate of child poverty. Thus, the process of going from an interest in child poverty for a given region will require:
B17020_002
for individuals under 6 in poverty), and then the final calculations.These types of operations are needlessly time consuming, especially since they often must be redone many times for different topics of interest, or to obtain updated numbers when new ACS data are available.
The motivation for this project is recognition of the value in reducing the redundancy of these operations. While there seem to be limitless constructions of interest that can be derived from the ACS aggregate tables--child poverty rate, teen birth rate, employment rate by education, or by gender, or by age, etc--we believe that a single data user should be able to define that construction in a way that allows all other users to immediately reuse that construction, even if focusing on different years and geographies.
The ultimate goal of this project is to benefit three classes of users:
These users would write programming scripts to pull constructed ACS measures for years, geographies and topics as needed, as an intermediate tool in other programming-based analyses. For example, adjacency of more and less poor neighborhoods could be used as a statistical predictor of the likelihood of crimes such as property theft or mugging.
These users would write programming scripts to source constructed measures for direct representation in web pages. For example, a web designer might generate a map of teen birth rates by census tract, overlaid by health clinics offering prenatal care.
These users would make use of an intuitive users interface to get direct access to constructions of interest, likely for direct representation in materials. For example, a non-profit might use a geojson front-end tool to select a geography reflecting their service area (which may likely span multiple census tracts), and select measures and years to aggregate. (See plenar.io for an example front-end.) For a more specific example, a non-profit providing language assistance to families in a given neighborhood might use this type of application to query the number and percentage of families with children who speak limited English, in the development of a grant application.