Closed qhykwsw closed 2 years ago
Hi, sorry for the late reply. Could you provide a testing script? thanks!
OK, the testing script is as follows:
import os
import argparse
import numpy as np
import pandas as pd
import random
from tqdm import tqdm
import time
from DeepPurpose import PPI as models
from DeepPurpose.utils import *
from DeepPurpose.dataset import *
if __name__ == '__main__':
parser = argparse.ArgumentParser("Train lightGBM classifier based on dataset...")
parser.add_argument("--seed", type=int, default=2021)
parser.add_argument("--target_encoding", type=str, default="Transformer")
parser.add_argument("--ycol", type=str, default='delta_g')
parser.add_argument("--train_epoch", type=int, default=100)
parser.add_argument("--learning_rate", type=float, default=1e-3)
parser.add_argument("--batch_size", type=int, default=256)
parser.add_argument("--num_workers", type=int, default=46)
args = parser.parse_args()
seed = args.seed
target_encoding = args.target_encoding
ycol = args.ycol
train_epoch = args.train_epoch
learning_rate = args.learning_rate
batch_size = args.batch_size
num_workers = args.num_workers
nrows = 1e9
train = pd.read_csv("./data/train.tsv", sep="\t", nrows=nrows)
test = pd.read_csv("./data/test.tsv", sep="\t", nrows=nrows)
train[ycol] = train[ycol].rank()
test[ycol] = np.random.rand(len(test))
train, val, _ = data_process(
X_target=np.array(train['antigen_seq']),
X_target_=np.array(train['antibody_seq_b']),
y=np.array(train[ycol]),
target_encoding=target_encoding,
split_method='random',
random_seed=seed,
frac=[0.8, 0.2, 0.0]
)
test = data_process(
X_target=np.array(test['antigen_seq']),
X_target_=np.array(test['antibody_seq_b']),
y=np.array(test[ycol]),
target_encoding=target_encoding,
split_method='no_split',
)
# use the parameters setting provided in the paper: https://arxiv.org/abs/1801.10193
config = generate_config(
target_encoding = target_encoding,
result_folder='./results/',
input_dim_protein = 8420,
hidden_dim_protein = 256,
cls_hidden_dims = [1024,1024,512],
train_epoch=train_epoch,
LR=learning_rate,
batch_size=batch_size,
cnn_target_filters = [32,64,96],
cnn_target_kernels = [4,8,12],
transformer_emb_size_target = 64,
transformer_intermediate_size_target = 256,
transformer_num_attention_heads_target = 4,
transformer_n_layer_target = 2,
transformer_dropout_rate = 0.1,
transformer_attention_probs_dropout = 0.1,
transformer_hidden_dropout_rate = 0.1,
num_workers=num_workers,
)
model = models.model_initialize(**config)
model.train(train=train, val=val)
test[ycol] = np.array(model.predict(test))
Thanks.
Thanks! It seems perfectly correct. Could you send me the data fildes to test? also, maybe an error traceback log would be great.
OK, I have sent the data file to your mailbox (kexinh@stanford.edu). And the error traceback log is as follows:
'''
WARNING:root:No normalization for BCUT2D_MWHI
WARNING:root:No normalization for BCUT2D_MWLOW
WARNING:root:No normalization for BCUT2D_CHGHI
WARNING:root:No normalization for BCUT2D_CHGLO
WARNING:root:No normalization for BCUT2D_LOGPHI
WARNING:root:No normalization for BCUT2D_LOGPLOW
WARNING:root:No normalization for BCUT2D_MRHI
WARNING:root:No normalization for BCUT2D_MRLOW
Protein Protein Interaction Prediction Mode...
in total: 1706 protein-protein pairs
encoding protein...
unique target sequence: 638
encoding protein...
unique target sequence: 916
Done.
Protein Protein Interaction Prediction Mode...
in total: 178 protein-protein pairs
encoding protein...
unique target sequence: 55
encoding protein...
unique target sequence: 104
Index(['Target Sequence 1', 'Target Sequence 2', 'Label', 'target_encoding_1',
'target_encoding2'],
dtype='object')
Let's use 2 GPUs!
--- Data Preparation ---
--- Go for Training ---
Training at Epoch 1 iteration 0 with loss 916698.. Total time 0.00222 hours
Validation at Epoch 1 , MSE: 934237. , Pearson Correlation: 0.12775 with p-value: 1.83E-02 , Concordance Index: 0.53829
--- Training Finished ---
predicting...
Traceback (most recent call last):
File "nn.py", line 95, in
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/anaconda3/envs/DeepPurpose/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/root/anaconda3/envs/DeepPurpose/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/anaconda3/envs/DeepPurpose/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
Hi, thanks! I have identified the errors based on your log. Could you try the git source version to see if it works?
I tested it and found it worked. Thanks.
Sounds good, just updated in 0.1.5
Dear Dr.huang, Thank you for providing such a great package. I want to make a Protein-Protein Interaction Prediction. I have a dataset containing training and testing data, where the testset has no labels.
I used "data_process" method twice. The first time I used "random" method to make the training set into training and validing data, and the second time I used "no_split" method to make testing data. The generated data has 5 columns: ['Target Sequence 1', 'Target Sequence 2', 'Label', 'target_ENCODING_1 ',' target_encoDING_2 '].
When model.train(train=train, val=val) is called, everything is fine, but when model.predict(test) is used, KeyError: 'drug_encoding' is encountered. However I am using the PPI model without drug_encoding, could you please help me find out where the problem is?
Thanks a lot qhykwsw