idrugLab / FP-GNN

53 stars 12 forks source link

Environment

The most important python packages are:

For using our model more conveniently, we provide the environment file to install environment directly.


Command

1. Train

Use train.py

Args:

E.g.

python train.py --data_path data/test.csv --dataset_type classification --save_path model_save --log_path log

2. Predict

Use predict.py

Args:

E.g.

python predict.py --predict_path data/test.csv --model_path model_save/test.pt --result_path result.csv

3. Hyperparameters Optimization

Use hyper_opti.py

Args:

E.g.

python hyper_opti.py --data_path data/test.csv --dataset_type classification --save_path model_save --log_path log

4. Interpretation of Fingerprints

Use interpretation_fp.py

Args:

E.g.

python interpretation_fp.py --predict_path test.csv --model_path model_save/test.pt --result_path result.txt

5. Interpretation of Graph

Use interpretation_graph.py

Args:

E.g.

python interpretation_graph.py --predict_path test.csv --model_path model_save/test.pt --figure_path figure


Data

We provide the three public benchmark datasets used in our study:

Or you can use your own dataset:

1. For training

The dataset file should be a CSV file with a header line and label columns. E.g.

SMILES,BT-20
O(C(=O)C(=O)NCC(OC)=O)C,0
FC1=CNC(=O)NC1=O,0
...

2. For predicting

The dataset file should be a CSV file with a header line and without label columns. E.g.

SMILES
O(C(=O)C(=O)NCC(OC)=O)C
FC1=CNC(=O)NC1=O
...

3. For interpreting fingerprints

The dataset file should be a CSV file with a header line and label columns. E.g.

SMILES,BT-20
O(C(=O)C(=O)NCC(OC)=O)C,0
FC1=CNC(=O)NC1=O,0
...

4. For interpreting molecular graphs

The dataset file should be a CSV file with a header line and without label columns. E.g.

SMILES
O(C(=O)C(=O)NCC(OC)=O)C
FC1=CNC(=O)NC1=O
...

Example

1. Training a model on BACE dataset

Decompress the Data.rar and find BACE dataset file in Data/MoleculeNet/bace.csv.

Use command:

python train.py --data_path Data/MoleculeNet/bace.csv --dataset_type classification --save_path model_save/bace --log_path log/bace

2. Using trained model to predict new molecules (e.g. in test.csv)

The trained model is in model_save/bace/Seed_0/model.pt

Use command:

python predict.py --predict_path test.csv --model_path model_save/bace/Seed_0/model.pt --result_path result.csv

3. Interpreting fingerprints

Interpreting fingerprints should use the training data and the trained model

Use command:

python interpretation_fp.py --predict_path Data/MoleculeNet/bace.csv --model_path model_save/bace/Seed_0/model.pt --result_path result.txt

4. Interpreting molecular graphs

Interpreting molecular graphs with the specific molecules (e.g. in test.csv) and the trained model

Use command:

python interpretation_graph.py --predict_path test.csv --model_path model_save/bace/Seed_0/model.pt --figure_path figure/bace