ycq091044 / SafeDrug

IJCAI2021: Code for SafeDrug, MIMIC data processing, Medical code mapping
67 stars 16 forks source link
drug-molecules drug-recommendation graph-neural-networks mimic-iii

Data and Code for IJCAI'21 paper - SafeDrug

YOU NEED TO KNOW FIRST!

[Implementation difference] Here are two main differences:

  1. As we mentioned below, the main difference of two branches is in how we get the drug SMILES string (the paper crawl methods misses a lot of molecules, while the current branch uses drugbank method, which gives more comprehensive sets).
  2. The data processing scripts are also a bit difference, and thus output data statistics differ from the ones reported in the paper.

[which branch to use?] General guidance:

  1. This master branch contains more descriptions (to learn how to use our codes), and the folder structures are very similar to archived branch.
  2. Use the archived branch to reproduce the results in the paper.

Folder Specification

Note that we previously use ./data/get_SMILES.py for getting SMILES strings from drugbank. However, due to the web structure change of drugbank, this crawler is not used in the current pipeline. Now, we are using drugbank_drugs_info.csv to obtain the SMILES string for each ATC3 code, thus, the data statistics differ a bit from the paper. The current statistics are shown below:

#patients  6350
#clinical events  15032
#diagnosis  1958
#med  112
#procedure 1430
#avg of diagnoses  10.5089143161256
#avg of medicines  11.647751463544438
#avg of procedures  3.8436668440659925
#avg of vists  2.367244094488189
#max of diagnoses  128
#max of medicines  64
#max of procedures  50
#max of visit  29

High-level Clarifications on How to Map ATC Code to SMILES

Step 1: Package Dependency

can also use the following in your current env

pip install rdkit-pypi


- then, in SafeDrug environment, install the following package
```python
pip install scikit-learn, dill, dnc

Note that torch setup may vary according to GPU hardware. Generally, run the following

pip install torch

If you are using RTX 3090, then plase use the following, which is the right way to make torch work.

python3 -m pip install --user torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

Here is a list of reference versions for all package

pandas: 1.3.0
dill: 0.3.4
torch: 1.8.0+cu111
rdkit: 2021.03.4
scikit-learn: 0.24.2
numpy: 1.21.1

Let us know any of the package dependency issue. Please pay special attention to pandas, some report that a high version of pandas would raise error for dill loading.

Step 2: Data Processing

Step 3: run the code

python SafeDrug.py

here is the argument:

usage: SafeDrug.py [-h] [--Test] [--model_name MODEL_NAME]
               [--resume_path RESUME_PATH] [--lr LR]
               [--target_ddi TARGET_DDI] [--kp KP] [--dim DIM]

optional arguments:
  -h, --help            show this help message and exit
  --Test                test mode
  --model_name MODEL_NAME
                        model name
  --resume_path RESUME_PATH
                        resume path
  --lr LR               learning rate
  --target_ddi TARGET_DDI
                        target ddi
  --kp KP               coefficient of P signal
  --dim DIM             dimension

Citation

@inproceedings{yang2021safedrug,
    title = {SafeDrug: Dual Molecular Graph Encoders for Safe Drug Recommendations},
    author = {Yang, Chaoqi and Xiao, Cao and Ma, Fenglong and Glass, Lucas and Sun, Jimeng},
    booktitle = {Proceedings of the Thirtieth International Joint Conference on
               Artificial Intelligence, {IJCAI} 2021},
    year = {2021}
}

Welcome to contact me chaoqiy2@illinois.edu for any question. Partial credit to https://github.com/sjy1203/GAMENet.