uf-hobi-informatics-lab / SODA_Docker

repo for pipeline development
3 stars 1 forks source link

Social Determinants of Health pipeline execution:

Setting up this repository

After cloning the repository, please run the following commands:

git checkout SDoH_pipeline
git submodule init
git submodule update

Needed files and folder structure:

  1. encoded_text folder:
    • This folder will contain the relevant text files to be run through the pipeline in plain text format. The location of this folder must be under the root directory specified in the config.yml file.
  2. config.yml file:
    • gpu_node: Specify the GPUs to be used during the NER and relation extraction steps (NER supports multi-GPU processing. This paramenter can also be overridden when using the bash script.gi)
    • root dir: Base directory where output is to be placed by the pipeline. This should be the directory containing your encoded_text folder.
    • raw_data_dir: Base location of all relevant raw data, to be used if encoded_text folder is not provided.
    • generate_bio: Defines wheter or not the NER part of the pipeline generates .bio format output.
    • encoded_text: Signals if encoded text already exists.
    • ner_model: Contains the specific information pointing to the model to be used.
      • type: Specify the type of model to train/use.
      • path: The location of the pretrained model to be used as a base

Example config.yml:

  gpu_node: 4
  root_dir: /home/dparedespardo/project/SDoH_pipeline_demo
  raw_data_dir: /data/datasets/Tianchen/data_from_old_server/2021/ADRD_data_from_Xi/clinical_notes_all_0826/
  generate_bio: False
  encoded_text: True
    type: bert
    path: /data/datasets/zehao/sdoh/model/SDOH_bert_final

Running the pipeline:

To run the pipeline, please execute the run.sh providing the following arguments:


./run_demo.sh -c config.yml -n 0 2 4


The output file, in .csv format, is organized in the following way:

If you have a metadata file to map individual notes to patients, you can also include the further entries on your output file, such as:


Please cite our paper:

Yu, Z., Peng, C., Yang, X., Dang, C., Adekkanattu, P., Gopal Patra, B., Peng, Y., Pathak, J., Wilson, D.L., Chang, C.-Y., Lo-Ciganic, W.-H., George, T.J., Hogan, W.R., Guo, Y., Bian, J., Wu, Y., 2024. Identifying social determinants of health from clinical narratives: A study of performance, documentation ratio, and potential bias. J. Biomed. Inform. 104642.