nyu-dl / dl4chem-mgm

BSD 3-Clause "New" or "Revised" License
69 stars 10 forks source link

Validation losses keep increasing #4

Closed hwidong-na closed 1 year ago

hwidong-na commented 2 years ago

Hi,

Thank you for sharing a great work. I tried to replicate training from scratch, yet does not seem to work properly on both QM9 and ChEMBL datasets. Training losses decrease as training iterations as training iterations go, while validation losses keep increasing from the beginning. For example, running the following scripts:

python train.py --data_path data/QM9/QM9_processed.p --graph_type QM9 --exp_name QM9_experiment --num_node_types 5 --num_edge_types 5 --max_nodes 9 --layer_norm --spatial_msg_res_conn --batch_size 1024 --val_batch_size 2500 --val_after 105 --num_epochs 200 --shuffle --mask_independently --force_mask_predict --optimizer adam,lr=0.0001 --tensorboard

the losses at the beginning:

INFO - 11/09/21 14:52:37 - 0:07:59 - total_iter = 105, loss = 0.21, is_aromatic = 0.00, is_in_ring = 0.01, chirality = 0.11, charge = 0.01, node_type = 0.04, hydrogens = 0.04, edge_type = 0.01
INFO - 11/09/21 14:52:37 - 0:07:59 - Validating
INFO - 11/09/21 14:53:54 - 0:09:17 - Validation_loss: 6.92
INFO - 11/09/21 14:53:54 - 0:09:17 - node_type_loss: 0.30
INFO - 11/09/21 14:53:54 - 0:09:17 - hydrogens_loss: 4.70
INFO - 11/09/21 14:53:54 - 0:09:17 - charge_loss: 0.12
INFO - 11/09/21 14:53:54 - 0:09:17 - is_in_ring_loss: 0.41
INFO - 11/09/21 14:53:54 - 0:09:17 - is_aromatic_loss: 0.04
INFO - 11/09/21 14:53:54 - 0:09:17 - chirality_loss: 0.50
INFO - 11/09/21 14:53:54 - 0:09:17 - edge_type_loss: 0.85

the losses at the middle of the training:

INFO - 11/10/21 05:17:22 - 14:32:45 - total_iter = 9975, loss = 0.00, is_aromatic = 0.00, is_in_ring = 0.00, chirality = 0.00, charge = 0.00, node_type = 0.00, hydrogens = 0.00, edge_type = 0.00
INFO - 11/10/21 05:17:22 - 14:32:45 - Validating
INFO - 11/10/21 05:18:43 - 14:34:06 - Validation_loss: 12.83
INFO - 11/10/21 05:18:43 - 14:34:06 - node_type_loss: 1.11
INFO - 11/10/21 05:18:43 - 14:34:06 - hydrogens_loss: 8.00
INFO - 11/10/21 05:18:43 - 14:34:06 - charge_loss: 0.28
INFO - 11/10/21 05:18:43 - 14:34:06 - is_in_ring_loss: 0.49
INFO - 11/10/21 05:18:43 - 14:34:06 - is_aromatic_loss: 0.02
INFO - 11/10/21 05:18:43 - 14:34:06 - chirality_loss: 1.79
INFO - 11/10/21 05:18:43 - 14:34:06 - edge_type_loss: 1.14

the losses at the end:

INFO - 11/10/21 21:42:53 - 1 day, 6:58:16 - total_iter = 20790, loss = 0.00, is_aromatic = 0.00, is_in_ring = 0.00, chirality = 0.00, charge = 0.00, node_type = 0.00, hydrogens = 0.00, edge_type = 0.00
INFO - 11/10/21 21:42:53 - 1 day, 6:58:16 - Validating
INFO - 11/10/21 21:44:17 - 1 day, 6:59:40 - Validation_loss: 15.39
INFO - 11/10/21 21:44:17 - 1 day, 6:59:40 - node_type_loss: 1.19
INFO - 11/10/21 21:44:17 - 1 day, 6:59:40 - hydrogens_loss: 10.07
INFO - 11/10/21 21:44:17 - 1 day, 6:59:40 - charge_loss: 0.32
INFO - 11/10/21 21:44:17 - 1 day, 6:59:40 - is_in_ring_loss: 0.38
INFO - 11/10/21 21:44:17 - 1 day, 6:59:40 - is_aromatic_loss: 0.02
INFO - 11/10/21 21:44:17 - 1 day, 6:59:40 - chirality_loss: 1.91
INFO - 11/10/21 21:44:17 - 1 day, 6:59:40 - edge_type_loss: 1.51

It would be appreciate to share your experience on this problem.

omarnmahmood commented 2 years ago

It appears there may be an issue with the way the random seed is used if it is set manually as was the default configuration earlier. I have changed the default value of the random seed so that a fixed seed is not used, this should solve the problem. Please confirm.