orchid-initiative / synthetic-database-project

MIT License
4 stars 2 forks source link

Create synthetic datasets #13

Open rileeki opened 1 year ago

rileeki commented 1 year ago

Data submission guidelines for CA hospitals:

https://hcai.ca.gov/wp-content/uploads/2022/12/IP-format-and-file-specs-jan-2023.pdf https://hcai.ca.gov/wp-content/uploads/2022/12/ED-AS-format-and-file-specs-jan-2023.pdf

Hospitals are required to submit their patient discharge data to the state as .txt files with a specific fixed-width layout.

I'd like to create synthetic data that we can use to demonstrate projects before having access to real, sensitive data.

rileeki commented 1 year ago

This is probably a whole project of its own... We'd want them to roll up to match the real, publicly available summary statistics. It could actually be kind of like our synthetic case pools for the surgery scheduling project.

rileeki commented 1 year ago

Summary files:

Inpatient

ED

MS-DRGs

Homeless IP & ED

rileeki commented 1 year ago

also note: The 2021 IP data, at least, is set up with an existing Data API that might be worth exploring instead of downloading the Excel files and creating our own database. https://docs.ckan.org/en/latest/maintaining/datastore.html