orchid-initiative / synthetic-database-project

MIT License
4 stars 2 forks source link

Facility Identification Number #84

Closed TravisHaussler closed 8 months ago

TravisHaussler commented 10 months ago

FACILITY IDENTIFICATION NUMBER 6 digit Facility Identification Number (the unique facility number assigned by HCAI)

OSHPD’s financial and utilization databases begin with a 3-digit number that indicates the “type of facility” (106=hospital, 206=long term care, 306 = clinic, 406 = home health/hospice agency). The last six digits are the Facility Number that is issued by HCAI; this number is also used in the PDD & ED/AS Data Sets. The first two digits indicate the county in which the facility operates. The last four digits are assigned by HCAI to identify the facility.

https://hcai.ca.gov/data-and-reports/request-data/data-documentation/#:~:text=The%20last%20six%20digits%20are,HCAI%20to%20identify%20the%20facility.

Synthea provides us: provider_num and npi

For example, for Keck Hospital we have:

provider_num,npi,name,address,city
"50696","1013514199","KECK MEDICAL CENTER OF USC","1500 SAN PABLO ST"

50696 ("provider_num") appears to be CMS CCN (previously known as Medicare Provider Number/Medicare ID) 1013514199 ("npi") refers to the National Provider Identifier (NPI) records. Healthcare providers acquire their unique 10-digit NPIs to identify themselves in a standard way throughout their industry.

But the HCAI (previously known as OSHPD) id for Keck is 106194219, so the Facility ID we would supply would be 194219

So the issue here is how do we map provider number or npi to HCAI ID and therefore facility ID?

TravisHaussler commented 10 months ago

provider_num is not listed for the majority of hospitals in the hospitals.csv input. So, this suggests we should use NPI, but I cannot find a way to see the NPI in the synthea output. Encounters has an "ORGANIZATION" field which matches with a ID from the organizations.csv output file. This can give us the name of the organization. However, one named hospital can have many, many NPIs (for example "JOHNS HOPKINS UNIVERSITY" has 48 different NPIs). So, we cant do a successful pandas merge here from what I can see.

I am going to add the "organization" name to a field called "Facility Name" in the CSV output (but not the fixed_width output) for now to allow us to more easily visualize work like the keck test

TravisHaussler commented 8 months ago

We found that if we use an override for hospitals we can arbitrarily add a column at the end of the hospitals data containing the HCAI 9-digit code. We can strip the first 3 digits to get the common 6-digit code.

We added code support to do this work for us, although its somewhat hard-coded right now: https://github.com/orchid-initiative/synthetic-database-project/issues/89

In theory we could add a general NPI -> HCAI map and try to match multiple fields on the organizations.csv to the underlying hospitals.csv synthea input to get a pretty accurate NPI and then run that through the mapping.