This repository provides code for converting the MIMIC-IV and MIMIC-IV-ED databases into FHIR.
Code in this repository is organized as follows:
sql/ contains SQL scripts for creating the FHIR tables in PostgreSQL and mapping data from MIMIC-IV/MIMIC-IV-ED
py_mimic_fhir/ contains a Python package for importing, validating and exporting FHIR resources from a HAPI FHIR server
mimic-profiles/ (submodule) contains the FHIR profiles and terminology for MIMIC-IV/MIMIC-IV-ED
hapi-fhir-jpaserver-starter/ (submodule) contains a fork of the HAPI FHIR JPA Server Starter project with some modifications to support the MIMIC-IV/MIMIC-IV-ED profiles and terminology
mimic-code/ (submodule) contains the MIMIC-IV build scripts for building the MIMIC-IV/MIMIC-IV-ED databases in PostgreSQL
fhir-packages contains the FHIR packages for the MIMIC-IV/MIMIC-IV-ED profiles and terminology (currently an empty folder)
A version of MIMIC-IV-on-FHIR (original repo here). The scripts and packages in the repository will generate the MIMIC-IV FHIR tables in PostgreSQL, validate in HAPI fhir, and export to ndjson.
Also know that there are specific instructions for MIMIC-IV and MIMIC-IV-ED to be loaded into your local Postgres. The specific instructions are at MIMIC-IV guide and MIMIC-IV-ED guide. You can follow those instructions, but I've included it all here, but I want to ensure I give credit where it is due. (Note: When following other instructions please use the same db name across both guides ie. mimiciv
)
This repository is provided for those who wish to explore the build process and regenerate the data in FHIR themselves. For those who are simply interested in the data, there are two PhysioNet projects where the data has already been published:
Briefly, the steps to convert MIMIC-IV/MIMIC-IV-ED to FHIR are as follows:
First install git, wget, and postgresql
# update
sudo apt update
sudo apt install git wget postgresql postgresql-contrib
Clone the repository and its submodules.
# use recurse submodules to also clone the mimic-code/mimic-profiles repo
git clone --recurse-submodules https://github.com/fhir-fli/mimic-fhir.git
Configure a user for the database. For convenient access, you should pick a username which is identical to your operating system username, that way you won't have to specify the username when connecting to the database, and authentication is simplified.
#get into postgres
sudo -i -u postgres
postgres@desktop:~$ psql
# For this, the user needs to be the same as the username you are using on the current computer you're using
# replace '${PASSWORD}' with your actual password, but leave the single quotes around it
postgres=# CREATE USER grey CREATEDB password '${PASSWORD}';
postgres=# exit
postgres@desktop:~$ exit
<USERNAME>
is your physionet usernameThe following commands should be run in the mimic-fhir directory.
wget -r -N -c -np --user <USERNAME> --ask-password https://physionet.org/files/mimiciv/2.2/
wget -r -N -c -np --user <USERNAME> --ask-password https://physionet.org/files/mimic-iv-ed/2.2/
# move the actual data files
mv physionet.org/files/mimiciv mimiciv
mv physionet.org/files/mimic-iv-ed mimicived
# delete the rest
rm -r physionet.org/
# creates the database itself
createdb mimiciv
psql -d mimiciv -f mimic-code/mimic-iv/buildmimic/postgres/create.sql
# take note of the mimiciv version you're on and change the directory accordingly, this one takes a while
psql -d mimiciv -v ON_ERROR_STOP=1 -v mimic_data_dir=mimiciv/2.2 -f mimic-code/mimic-iv/buildmimic/postgres/load_gz.sql
# The first time you do this, the scripts delete ("drop" in sql parlance) things before you create them to remove old versions.
# This produces many warnings, you can safely ignore them.
psql -d mimiciv -v ON_ERROR_STOP=1 -v mimic_data_dir=mimiciv/2.2 -f mimic-code/mimic-iv/buildmimic/postgres/constraint.sql
psql -d mimiciv -v ON_ERROR_STOP=1 -v mimic_data_dir=mimiciv/2.2 -f mimic-code/mimic-iv/buildmimic/postgres/index.sql
# We're basically just going to repeat with the mimic ED data
psql -d mimiciv -f mimic-code/mimic-iv-ed/buildmimic/postgres/create.sql
psql -d mimiciv -v ON_ERROR_STOP=1 -v mimic_data_dir=mimicived/2.2/ed -f mimic-code/mimic-iv-ed/buildmimic/postgres/load_gz.sql
psql -d mimiciv -v ON_ERROR_STOP=1 -v mimic_data_dir=mimicived/2.2/ed -f mimic-code/mimic-iv-ed/buildmimic/postgres/constraint.sql
psql -d mimiciv -v ON_ERROR_STOP=1 -v mimic_data_dir=mimicived/2.2/ed -f mimic-code/mimic-iv-ed/buildmimic/postgres/index.sql
# validate that the setup is correct
psql -d mimiciv -f mimic-code/mimic-iv-ed/buildmimic/postgres/validate.sql
psql -d mimiciv -f mimic-code/mimic-iv/buildmimic/postgres/validate.sql
mimic-fhir/sql
cd sql
psql -d mimiciv -f create_fhir_tables.sql
psql -d mimiciv -f validate_fhir_tables.sql
<output-dir>
by running create_fhir_jsons.sql found in the folder mimic-fhir/sql
(replace <output-dir>
with the desired existing and empty output directory).psql -d mimiciv -v "outputdir=<output-dir>" -f sql/create_fhir_jsons.sql
# leave the mimic-fhir/sql directory and clone the hapi-fhir repo
cd ../.. && git clone https://github.com/kind-lab/hapi-fhir-jpaserver-starter.git
createdb hapi_r4
cd hapi-fhir-jpaserver-starter
mvn jetty:run
MIMIC_TERMINOLOGY_PATH
is set and pointing to the latest terminology files mimic-profiles/input/resources
cd ../mimic-fhir
export $(grep -v '^#' .env | xargs)
pip install -e .
pip install google-cloud
pip install google-cloud-pubsub
pip install google-api-python-client
pip install psycopg2-binary
pip install pandas-gbq
pip install fhir
pip install fhir-resources
python py_mimic_fhir terminology --post
python3 py_mimic_fhir validate --init
python3 py_mimic_fhir validate --num_patients 5
Any failed bundles will be written to your log folder specified in .env
Export mimic-fhir to ndjson
python3 py_mimic_fhir export --export_limit 100
export_limit
will reduce how much is written out to file. It limits how many binaries are written out. Each binary ~1000 resources. So in this case the limit of 1 will output 1000 resources into ndjsons The bin/psql-export-trm.py
script can be used to generate terminology resources such as code systems and value sets
from the fhir_trm
schema of mimic database. These resources can be used to update the MIMIC code systems and value sets defintions in MIMIC-IV IG
(mimic-profile/input/resources
).
To update the resource generate the terminology tables in postgresql SQL first with sql/create_fhir_terminology.sql
(or `sql/create_fhir_terminology.sql) and then run the script with the following command (replace the placeholders with the actual values):
python bin/psql-export-trm.py \
--db-name "${DATABASE}" \
--db-user "${USER}" \
--db-pass "${PGPASSWORD}" \
--date "2022-09-21T13:59:43-04:00" \
mimic-profiles/input/resources
The script requires click
python package (in addition to the packages listed in the section above).