mindrank-ai / PharmHGT

18 stars 2 forks source link

bug link to data split size #3

Open thegodone opened 1 year ago

thegodone commented 1 year ago

Using the esol case if I use 0.7,0.1,0.2 split by default model training works. But If I use 0.8,0.1,0.1 like in the publication page 4 I have the following error:

root@18d68cb2076c:/fwd/PharmHGT# python train.py configs/esol.json
{'data': {'path': 'data_index/esol', 'task': 'regression', 'target_names': ['logSolubility'], 'batch_size': 64}, 'train': {'loss_fn': 'rmse', 'metric_fn': 'rmse', 'warmup': 2, 'init_lr': 0.0001, 'max_lr': 0.001, 'final_lr': 1e-05, 'epochs': 50, 'num_fold': 5, 'save_path': './ckpt/esolcpu', 'device': 'cuda:0', 'data_name': 'esol'}, 'model': {'atom_dim': 42, 'bond_dim': 14, 'pharm_dim': 194, 'reac_dim': 34, 'hid_dim': 300, 'depth': 3, 'act': 'ReLU', 'num_task': 1}, 'seed': 2022}
{'path': 'data_index/esol', 'task': 'regression', 'target_names': ['logSolubility'], 'batch_size': 64}
dataset size, train: 789,                     val: 113,                     test: 226
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [04:29<00:00,  5.40s/it]
best epoch 26 for fold 0, val rmse:0.694085955619812, test: 0.5895809531211853
{'path': 'data_index/esol', 'task': 'regression', 'target_names': ['logSolubility'], 'batch_size': 64}
dataset size, train: 789,                     val: 113,                     test: 226
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [04:26<00:00,  5.34s/it]
best epoch 47 for fold 1, val rmse:0.618224024772644, test: 0.6661747097969055
{'path': 'data_index/esol', 'task': 'regression', 'target_names': ['logSolubility'], 'batch_size': 64}
dataset size, train: 789,                     val: 113,                     test: 226
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [04:32<00:00,  5.44s/it]
best epoch 27 for fold 2, val rmse:0.5945879817008972, test: 0.6353021264076233
{'path': 'data_index/esol', 'task': 'regression', 'target_names': ['logSolubility'], 'batch_size': 64}
dataset size, train: 789,                     val: 113,                     test: 226
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [04:30<00:00,  5.41s/it]
best epoch 39 for fold 3, val rmse:0.5803282260894775, test: 0.6274499893188477
{'path': 'data_index/esol', 'task': 'regression', 'target_names': ['logSolubility'], 'batch_size': 64}
dataset size, train: 789,                     val: 113,                     test: 226
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [04:22<00:00,  5.26s/it]
best epoch 21 for fold 4, val rmse:0.6722652912139893, test: 0.6289862394332886
average performance: 0.6294988036155701+/-0.024399572946288837
root@18d68cb2076c:/fwd/PharmHGT# python data.py
root@18d68cb2076c:/fwd/PharmHGT# python train.py configs/esol.json
{'data': {'path': 'data_index/esol', 'task': 'regression', 'target_names': ['logSolubility'], 'batch_size': 64}, 'train': {'loss_fn': 'rmse', 'metric_fn': 'rmse', 'warmup': 2, 'init_lr': 0.0001, 'max_lr': 0.001, 'final_lr': 1e-05, 'epochs': 50, 'num_fold': 5, 'save_path': './ckpt/esolcpu', 'device': 'cuda:0', 'data_name': 'esol'}, 'model': {'atom_dim': 42, 'bond_dim': 14, 'pharm_dim': 194, 'reac_dim': 34, 'hid_dim': 300, 'depth': 3, 'act': 'ReLU', 'num_task': 1}, 'seed': 2022}
{'path': 'data_index/esol', 'task': 'regression', 'target_names': ['logSolubility'], 'batch_size': 64}
dataset size, train: 902,                     val: 113,                     test: 113
  4%|█████▎                                                                                                                               | 2/50 [00:18<07:34,  9.47s/it]
Traceback (most recent call last):
  File "train.py", line 146, in <module>
    results = train(data_args,train_args,model_args,seed)
  File "train.py", line 97, in train
    pred = model(bg)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/fwd/PharmHGT/model.py", line 268, in forward
    self.init_feature(bg)
  File "/fwd/PharmHGT/model.py", line 259, in init_feature
    bg.edges[('p','r','p')].data['x'] = self.act(self.w_reac(bg.edges[('p','r','p')].data['x']))
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x0 and 34x300)
root@18d68cb2076c:/fwd/PharmHGT# 
13787374060 commented 2 weeks ago

How is this data processed, and can you provide the code?