kind-lab / mimic-fhir

A version of MIMIC-IV in FHIR
MIT License
37 stars 13 forks source link

mimic-fhir

This repository provides code for converting the MIMIC-IV and MIMIC-IV-ED databases into FHIR.

Code in this repository is organized as follows:

Accessing the data

This repository is provided for those who wish to explore the build process and regenerate the data in FHIR themselves. For those who are simply interested in the data, there are two PhysioNet projects where the data has already been published:

Building MIMIC-IV on FHIR

Briefly, the steps to convert MIMIC-IV/MIMIC-IV-ED to FHIR are as follows:

  1. Clone this repository and its submodules
  2. Install PostgreSQL and create a database
  3. Download the MIMIC-IV/MIMIC-IV-ED data and load it into PostgreSQL
  4. Generate the FHIR tables by running create_fhir_tables.sql

Detailed instructions (Ubuntu)

Install packages

First install git, wget, and postgresql

# update
sudo apt update
sudo apt install git wget postgresql postgresql-contrib

Clone the repository and its submodules.

# use recurse submodules to also clone the mimic-code/mimic-profiles repo
git clone --recurse-submodules https://github.com/fhir-fli/mimic-fhir.git

Create a postgres user

Configure a user for the database. For convenient access, you should pick a username which is identical to your operating system username, that way you won't have to specify the username when connecting to the database, and authentication is simplified.

#get into postgres
sudo -i -u postgres
postgres@desktop:~$ psql
# For this, the user needs to be the same as the username you are using on the current computer you're using
# replace '${PASSWORD}' with your actual password, but leave the single quotes around it
postgres=# CREATE USER grey CREATEDB password '${PASSWORD}';

postgres=# exit
postgres@desktop:~$ exit

Download the data and structure it in Postgresql

The following commands should be run in the mimic-fhir directory.

wget -r -N -c -np --user <USERNAME> --ask-password https://physionet.org/files/mimiciv/2.2/
wget -r -N -c -np --user <USERNAME> --ask-password https://physionet.org/files/mimic-iv-ed/2.2/

# move the actual data files
mv physionet.org/files/mimiciv mimiciv 
mv physionet.org/files/mimic-iv-ed mimicived

# delete the rest
rm -r physionet.org/

# creates the database itself
createdb mimiciv
psql -d mimiciv -f mimic-code/mimic-iv/buildmimic/postgres/create.sql

# take note of the mimiciv version you're on and change the directory accordingly, this one takes a while
psql -d mimiciv -v ON_ERROR_STOP=1 -v mimic_data_dir=mimiciv/2.2 -f mimic-code/mimic-iv/buildmimic/postgres/load_gz.sql

# The first time you do this, the scripts delete ("drop" in sql parlance) things before you create them to remove old versions.
# This produces many warnings, you can safely ignore them.
psql -d mimiciv -v ON_ERROR_STOP=1 -v mimic_data_dir=mimiciv/2.2 -f mimic-code/mimic-iv/buildmimic/postgres/constraint.sql
psql -d mimiciv -v ON_ERROR_STOP=1 -v mimic_data_dir=mimiciv/2.2 -f mimic-code/mimic-iv/buildmimic/postgres/index.sql

# We're basically just going to repeat with the mimic ED data
psql -d mimiciv -f mimic-code/mimic-iv-ed/buildmimic/postgres/create.sql
psql -d mimiciv -v ON_ERROR_STOP=1 -v mimic_data_dir=mimicived/2.2/ed -f mimic-code/mimic-iv-ed/buildmimic/postgres/load_gz.sql
psql -d mimiciv -v ON_ERROR_STOP=1 -v mimic_data_dir=mimicived/2.2/ed -f mimic-code/mimic-iv-ed/buildmimic/postgres/constraint.sql
psql -d mimiciv -v ON_ERROR_STOP=1 -v mimic_data_dir=mimicived/2.2/ed -f mimic-code/mimic-iv-ed/buildmimic/postgres/index.sql

# validate that the setup is correct
psql -d mimiciv -f mimic-code/mimic-iv-ed/buildmimic/postgres/validate.sql
psql -d mimiciv -f mimic-code/mimic-iv/buildmimic/postgres/validate.sql

Conversion

cd sql
psql -d mimiciv -f create_fhir_tables.sql
psql -d mimiciv -f validate_fhir_tables.sql

Export to ndjson files

psql -d mimiciv -v "outputdir=<output-dir>" -f sql/create_fhir_jsons.sql

HAPI FHIR for use in validation/export

# leave the mimic-fhir/sql directory and clone the hapi-fhir repo
cd ../.. && git clone https://github.com/kind-lab/hapi-fhir-jpaserver-starter.git

createdb hapi_r4
cd hapi-fhir-jpaserver-starter
mvn jetty:run

PY_MIMIC_FHIR

cd ../mimic-fhir
export $(grep -v '^#' .env | xargs)
pip install -e .
pip install google-cloud
pip install google-cloud-pubsub
pip install google-api-python-client
pip install psycopg2-binary
pip install pandas-gbq
pip install fhir
pip install fhir-resources
python py_mimic_fhir terminology --post
python3 py_mimic_fhir validate --init
python3 py_mimic_fhir validate --num_patients 5
python3 py_mimic_fhir export --export_limit 100

Generating terminology resources

The bin/psql-export-trm.py script can be used to generate terminology resources such as code systems and value sets from the fhir_trm schema of mimic database. These resources can be used to update the MIMIC code systems and value sets defintions in MIMIC-IV IG (mimic-profile/input/resources).

To update the resource generate the terminology tables in postgresql SQL first with sql/create_fhir_terminology.sql (or `sql/create_fhir_terminology.sql) and then run the script with the following command (replace the placeholders with the actual values):

python bin/psql-export-trm.py \  
  --db-name "${DATABASE}" \
  --db-user "${USER}" \
  --db-pass "${PGPASSWORD}" \
  --date "2022-09-21T13:59:43-04:00" \
  mimic-profiles/input/resources 

The script requires click python package (in addition to the packages listed in the section above).

Useful wiki links