vedants03 / Knowledge_Graph_Using_BioBERT

2 stars 2 forks source link


Knowledge Graph Using BioBERT

Contributors Forks Stargazers Issues

Table Of Contents

About The Project

Logo

This repository includes code for Named Entity Recognition and Relationship Extraction methods and knowledge graph generation through EHR records. These methods were performed on n2c2 2018 challenge dataset which was augmented to include a sample of ADE corpus dataset. This project is capstone project for my undergraduate degree in Bachelors of Technology (Computer Science and Engineering).

The purpose of this project is to automatically structure this data into a format that would enable doctors and patients to quickly find information that they need. Specifically, build a Named Entity Recognition (NER) model that would recognize entities such as drug, strength, duration, frequency, adverse drug event (ADE), reason for taking the drug, route and form. Further, the model would also recognize the relationship between drugs and every other named entity as well and generate a knowledge graph based on it so as to make it easier for the doctors to analyze the patient’s disease and drug history at a quick glance. The model would also have the feature of query answering wherein the knowledge graph will be used to answer the user queries.

The main objective of the project is to use the extracted relationships between drugs and every other entity to build a comprehensive knowledge graph which could be used for providing quick summary, query answering and analysis, thus simplifying knowledge discovery in the biomedical field

Built With

Getting Started

To run this project locally you need to get the datasets from the links mentioned below and preprocess the datasets to generate the dataset for training and testing of NER and RE models. Also you will need to have an Neo4J account for knowledge graph generation.

Prerequisites

  1. Datasets used for training.
https://huggingface.co/datasets/ade_corpus_v2
https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/
  1. Neo4J Connection URI
https://neo4j.com/
  1. Modules in requirements.txt
    pip install -r requirements.txt

Installation

  1. Preprocess the datasets
python generate_data.py 
    --task ner 
    --input_dir data
    --ade_dir ade_corpus
    --target_dir dataset
    --max_seq_len 512 
    --dev_split 0.1 
    --tokenizer biobert-base 
    --ext txt 
    --sep " " 
  1. Train the NER and RE Models

export MAX_LENGTH=128 export BATCH_SIZE=16 export NUM_EPOCHS=5 export SAVE_STEPS=1000 export SEED=0

python run_ner.py --data_dir ${DATA_DIR} --labels ${DATA_DIR}/labels.txt --model_name_or_path dmis-lab/biobert-large-cased-v1.1 --output_dir ${SAVE_DIR} --max_seq_length ${MAX_LENGTH} --num_train_epochs ${NUM_EPOCHS} --per_device_train_batch_size ${BATCH_SIZE} --save_steps ${SAVE_STEPS} --seed ${SEED} --do_train --do_eval --do_predict --overwrite_output_dir

* RE Model

```sh
export SAVE_DIR=./output
export DATA_DIR=./dataset

export MAX_LENGTH=128
export BATCH_SIZE=8
export NUM_EPOCHS=3
export SAVE_STEPS=1000
export SEED=1
export LEARNING_RATE=5e-5

python run_re.py 
    --task_name ehr-re 
    --config_name bert-base-cased 
    --data_dir ${DATA_DIR} 
    --model_name_or_path dmis-lab/biobert-base-cased-v1.1 
    --max_seq_length ${MAX_LENGTH} 
    --num_train_epochs ${NUM_EPOCHS} 
    --per_device_train_batch_size ${BATCH_SIZE} 
    --save_steps ${SAVE_STEPS} 
    --seed ${SEED} 
    --do_train 
    --do_eval 
    --do_predict 
    --learning_rate ${LEARNING_RATE} 
    --output_dir ${SAVE_DIR} 
    --overwrite_output_dir
  1. To start the app in development mode
uvicorn fast_api:app --reload

Usage

To show the operation of Named Entity Recognition (NER), Relationship Table and Knowledge Graph, a web app was developed using HTML, CSS, and JavaScript. A graphical user interface (GUI) is displayed, in which the user needs to upload an EHR from which entities, relationships have been identified based on which Knowledge graph is created. The retrieved entities can be viewed as a result. The relationship between retrieved entities can be viewed as a result. The Knowledge graph generated can be viewed as a result .

Logo

Logo

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

Creating A Pull Request

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Authors

Acknowledgements