Closed ghost closed 5 years ago
@zwhitman can you tell us a little about DataUSA and share the methodological references you used to develop it (e.g., how you look for the lowest moe, how you're comparing incongruent areas)?
@logantpowell could you also highlight where/what merges data. I'm am only beginning to familiarize myself with the innards of CitySDK and have noticed that it queries multiple endpoints. Are those independent services returning data that is incongruous with the other endpoints.
@kuanb the SDK currently leverages three Census APIs (DataWeb:statistics; TigerWeb:shapes; Geocoder) and a number of different DataWeb "endpoints." Each Census DataWeb endpoint is not an independent service, but a specific format of data, which requires us to map differently for each dataset. It's an internal metadata incongruency that we are trying to abstract away.
What I mean by "incongruent" in the title of this issue has to do with geographic areas/shapes. We use aggregated data, which means it's packaged into shapes to protect privacy. Many other services will give you a specific point (lat/long). Some other agencies (fed/state/local) aggregate their data into shapes as well. Often these shapes are specialized/locally defined and don't "line up" with our shapes. So, we are providing a means for users to pull nested/overlapping shapes/points.
@logantpowell Here are references to the methodologies we use for OnTheMap for Emergency Management http://onthemap.ces.census.gov/em/: http://lehd.ces.census.gov/doc/help/onthemap_em/OTMEM_SelectionMethodology.pdf (and particularly the ACS part of the doc) gets at algorithms for cobbling together statistical areas from different summary levels. Also, http://lehd.ces.census.gov/doc/help/onthemap_em/OTMEM_StatisticalMethodology.pdf provides a couple extra pieces of info on how MOEs were handled, as recommended by those in charge of ACS methodology.
In brief, we defined specific constraints to the problem (minimize derived MOEs) and combine statistical areas across summary levels (if necessary) to "best" approximate the arbitrary boundary. "Best" can be defined a number of different ways, but in this case we chose an areal Goldilocks solution - not too much bigger than the arbitrary boundary and not too much smaller. Different problems might require different outlooks on this - for example, one might demand that all population in the arbitrary boundary is included in the derived statistical area or one might demand that no population outside the arbitrary boundary is included in the derived statistical area. Out solution in in between these extremes.
:+1: Thank you @mwerevu This will definitely come in handy!
Adding another handy resource from @mwerevu (Matt Graham) A GeoStatistical Transformation Service.pdf
I'm closing this as there are now a number of third-party libraries which can assist with this problem:
I'm starting this thread to bring both CitySDK users and Census/Geographic subject matter experts into the development/maturation of our methods for merging aggregated data which are contained within/related to two different/incongruent geographic areas. Examples would include:
@afrasier & @johnson-tor-boozallen, please correct me if I'm mistaken, but, currently we're using the default settings in terraformer.io.