orchid-initiative / synthetic-database-project

MIT License
4 stars 2 forks source link

Generate a sample with 10-15 hospitals in the LA area #89

Closed rileeki closed 8 months ago

rileeki commented 8 months ago

Harbor-UCLA UCLA LA General Medical Center Keck Hospital Cedars Sinai White Memorial Glendale Adventist Verdugo Hospital Arcadia USC Good Samaritan Hospital "maybe add a Kaiser too"

Check if you can add the two fields for facility ID to the hospital file for later post processing in format operations, or if that will mess with Synthea

rileeki commented 8 months ago

Check if you can add fields to the hospital file, or if that will mess with Synthea

TravisHaussler commented 8 months ago

I made an override file with these: Harbor-UCLA RONALD REAGAN UCLA LA General Medical Center Keck Hospital Cedars Sinai White Memorial Glendale Adventist Verdugo Hospital Good Samaritan Hospital Arcadia USC Kaiser - Los Angeles

I am looking up the HCAI numbers now to test

TravisHaussler commented 8 months ago

For Good Samaritan Hospital is it the one in LA or the one in Bakersfield? I wasn't sure if that's the hospital we are working with and wanted it in the list for some reason or if we just want the big LA one

TravisHaussler commented 8 months ago

Ok, so this works, yay! I have output that has the correct facility IDs now. One hiccup though - I ran it for population 10 and got 54 patients (which is a separate issue I referenced on the call last week) with 3449 encounters.

Of these encounters, we had 27 that met the requirement to be reported in our output (i.e. filtered for type inpatient). And of those 27, 25 were in a hospital from our list. One of the ones that was not from a hospital was in a "primary_care_facility" from another input file for Synthea.

Diagnosis code was: 32485007 Hospital admission (procedure) Procedure codes were:

225338004   Risk assessment (procedure)
33367005    Angiography of coronary artery (procedure)
415070008   Percutaneous coronary intervention (procedure)
433236007   Transthoracic echocardiography (procedure)
TravisHaussler commented 8 months ago

CSV from the above referenced example run: csv_HCAIPDD_12-20-2023_1532.csv

TravisHaussler commented 8 months ago

I ran a bigger run (n=500) which resulted in 1600 records. Attached here csv_HCAIPDD_12-22-2023_1348.csv

Run stats:


------------------------CREATING PATIENT RECORDS------------------------ Running with options: Population: 500 Seed: 1703280916306 Provider Seed:1703280916306 Reference Time: 1703280916306 Location: Los Angeles, California Min Age: 0 Max Age: 140 Gender: F

Records: total=2378, alive=2196, dead=182

Patients created in: 12.0 mins 47.92 secs. Formatted output created in: 32.04 secs.


---------------------------TOTAL ELAPSED TIME---------------------------

Elapsed Time: 13.0 mins 20.01 secs.