synthetichealth / synthea

Synthetic Patient Population Simulator
https://synthetichealth.github.io/synthea
Apache License 2.0
2.16k stars 646 forks source link

Overall US Demographics #206

Closed smrtslckr closed 6 years ago

smrtslckr commented 7 years ago

Is there a best way to change the default settings to represent the US national overall demographics? Would I just modify a provided county to the national demographics in a pinch?

jawalonoski commented 7 years ago

Yes, one way would be to copy and modify one of the county files.

However, if no county file is provided, default demographics come from lib/world/demograhics.rb

If you model an area outside of Massachusetts, and you care about the accuracy of generated addresses, then you'll probably also have to modify lib/world/city_zip.json for zip codes and lib/world/MA_geo.rb for latitude and longitudes.

jakejang91 commented 6 years ago

I am trying to build up the model that can represents whole county of the Indiana. I have gathered and put all required files in "resource" folder, and changed a code in taks.rake to use those files. However, I am getting error messages saying that "rake undefiend method'[]' for nil:NilClass."

Is this because data is conflicting with those in lib/world/MA_geo.rb and lib/world/city_zip.json?

Can you suggest me a way to generate IN_geo.rb just like MA_geo.rb?

jawalonoski commented 6 years ago

Not sure. Can you post a stack trace? Maybe even the input files too.

Question -- do you require lat/lon for each address?

jakejang91 commented 6 years ago

I was trying to use Synthea:census to make json files of counties in Indiana, and wanted to further broaden the boundary to all counties in the USA. I do not require lat/lon for each address at this point. I am posting this trace photo, and also files that I was using.

error resources.zip

jawalonoski commented 6 years ago

Huh... Indianapolis census data is strange. Looks like towns can exist in multiple counties? This makes filtering out rollup data difficult... looking into it....

jawalonoski commented 6 years ago

@jakejang91 I created a new branch called other_usa_states that should work. You'll get data for Indiana (if you drop your files in the ./resources folder).

Some remaining issues -- you won't get proper zip codes (they'll show up as "XXXXX") without further changes and you won't get lat/lons in the Patient resources.

https://github.com/synthetichealth/synthea/tree/other_usa_states

ghost commented 6 years ago

Can you direct me to some resource describing how to create these County census files?

I want/need to create them for counties in South Dakota, Minnesota, Iowa, Nebraska and North Dakota.

If I can get these working, I'd happily contribute them back to this project.

Thanks

jawalonoski commented 6 years ago

@nyquist212 -- you should not have to create these county census files for the given areas. They should be available for download from the US Census Bureau. You should be able to find information here https://www.census.gov/2010census/data/ and download using this search service https://factfinder.census.gov/faces/nav/jsf/pages/searchresults.xhtml?refresh=t

You'll need the following files:

  1. subcounty population estimates for towns and cities (search factfinder.census.gov with "est2015") named SUB-EST2015_*.csv where * represents the state number. For example, 25 is for Massachusetts.

  2. county population estimates by age, gender, race, ethnicity (search factfinder.census.gov with "county population") named CC-EST2015-ALLDATA-*.csv where * represents the state number. For example, 25 is for Massachusetts.

  3. income data (search factfinder.census.gov with "S1901") named ACS_14_5YR_S1901_with_ann.csv

  4. education data (search factfinder.census.gov with "S1501") named ACS_14_5YR_S1501_with_ann.csv

ghost commented 6 years ago

Thanks... while the census website isn't the easiest to navigate, I was able to get this data from these links specifically.

“subcounty population estimates for towns and cities” https://www.census.gov/data/datasets/2016/demo/popest/total-cities-and-towns.html

“county population estimates by age, gender, race, ethnicity” https://www.census.gov/data/datasets/2016/demo/popest/counties-detail.html

“income data” https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_15_5YR_S1901&prodType=table

“Education data” https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_15_5YR_S1501&prodType=table

EDIT: I've just noticed these files are not quite the same as the MA samples. They are causing an array initialization problem (Abort) in rake.tasks around line 330 when I synthea:census. I think I'll have to mess around with my data to make it look like the MA example.

Hopefully a future version might be able to pull this census data straight from the census bureau API. https://www.census.gov/data/developers/data-sets/acs-5year.html https://api.census.gov/data/2015/acs/acs5?get=NAME,B01001_001E&in=state:46&for=county:*

filimon1 commented 6 years ago

@jawalonoski I am new to this, however, I noticed Synthea generate in the /.output directory, the Patient FHIR resource are bundle,(combination of different resources), is there a way I can just generate a single Patient FHIR data? or ways to parse the bundle to separate the Patient resource from the bundle? Thank You so much

jawalonoski commented 6 years ago

@filimon1 Not at this point.

If you want to hack the software to only export a Patient inside the Bundle then comment out lines 12 to 43 in lib/records/fhir.rb.

Alternatively, if you want to parse the Bundle and just pull out the Patient resource, look at using the fhir_models gem. Something like...

require 'fhir_models'
json = File.open('output/fhir/patient_bundle.json', 'UTF-8', &:read)
bundle = FHIR.from_contents(json)
patient = bundle.entry[0].resource

If you want to ask more questions about this, please open a new issue. This is off topic.

jawalonoski commented 6 years ago

This should be fixed with PR #245