Closed spadavec closed 2 years ago
i don't think it is automatically downloaded. can you please provide the command you use? thanks!
Hi @futianfan I didn't automatically download a model. I generated them all via:
from DeepPurpose import utils, CompoundPred
from tdc.single_pred import ADME
from tqdm import tqdm
from tdc.utils import retrieve_dataset_names
adme_datasets = retrieve_dataset_names('ADME')
for dataset_name in tqdm(adme_datasets):
X, y = ADME(name = dataset_name).get_data(format = 'DeepPurpose')
drug_encoding = 'Morgan'
train, val, test = utils.data_process(X_drug = X,
y = y,
drug_encoding = drug_encoding,
random_seed = 2)
config = utils.generate_config(drug_encoding = drug_encoding,
train_epoch = 20,
LR = 0.001,
batch_size = 128,
mpnn_hidden_size = 32
)
model = CompoundPred.model_initialize(**config)
model.train(train, val, test)
model.save_model('adme_models/' + dataset_name + '_model')
The models I listed above were all generated using the above script. I'd like to now (for example) load the caco2
model I generated, load some new SMILES patterns, and make predictions. How would I go about doing that?
Also, is there a way to modify the above code so that the model is generated using all of the data? I don't want to automatically lose ~20/30% of the data to get a validation set that I dont need.
Hi, this seems to be an issue for DeepPurpose, instead of TDC. TDC has no support for loading the pretrained model. For DeepPurpose (https://github.com/kexinhuang12345/DeepPurpose), you can load a pretrained model via
net = CompoundPred.model_pretrained('./cyp1a2_veith_model')
You can also specify the fraction of train/valid/test via specifying frac=[0.9,0.0,0.1]
in data_process
function. if you want no test set at all, do split_method = 'no_split'
in the data_process
function
Sorry for being so dumb, but I see a lot of documentation on how to create and save models, but not load them and use them for new predictions--is that documented somewhere?
For example, I have the following adme models:
I'd like to load the
caco2_wang_model
and then load new smiles compounds for predictions. Any pointers would be appreciated!