yvquanli / GLAM

Code for "An adaptive graph learning method for automated molecular interactions and properties predictions".
https://www.nature.com/articles/s42256-022-00501-8
MIT License
39 stars 8 forks source link

Cannot reproduce your results in paper #5

Closed wmhcqw closed 2 years ago

wmhcqw commented 2 years ago

Hi, I'm reproducing your results on ESOL dataset, but when using the default params, the test RMSE seems to be much higher than the result in your paper.

Could you provide your best parameters' setting? Below is the output logs I received.

Model saved at epoch 132
Testing...
{'dataset_root': 'chemrl_downstream_datasets/esol/', 'dataset': 'esol', 'split': 'scaffold', 'seed': 1234, 'split_seed': 1234, 'gpu': 0, 'note': 'None2', 'hid_dim_alpha': 4, 'mol_block': '_NNConv', 'e_dim': 1024, 'out_dim': 1, 'message_steps': 3, 'mol_readout': 'GlobalPool5', 'pre_norm': '_None', 'graph_norm': '_PairNorm', 'flat_norm': '_None', 'end_norm': '_None', 'pre_do': '_None()', 'graph_do': '_None()', 'flat_do': 'Dropout(0.2)', 'end_do': 'Dropout(0.2)', 'pre_act': 'RReLU', 'graph_act': 'RReLU', 'flat_act': 'RReLU', 'graph_res': 1, 'batch_size': 32, 'epochs': 999, 'loss': 'mse', 'optim': 'Adam', 'k': 6, 'lr': 0.001, 'lr_reduce_rate': 0.7, 'lr_reduce_patience': 20, 'early_stop_patience': 50, 'verbose_patience': 500}
{'testloss': 0.9559108018875122, 'valloss': 0.6927680969238281}|{'ci': 0.8770122343850612, 'mse': 0.97501713, 'rmse': 0.9874295571709956, 'r2': 0.779877562758926}|{'valci': 0.8853528843055108, 'valmse': 0.6884, 'valrmse': 0.8296987227490854, 'valr2': 0.8644862290531075}

Thanks for your time and contribution !

yvquanli commented 2 years ago

We think you may have some misunderstandings about our articles and scores.

First, we are an automated graph learning method, you should try the glam.py, not the run.py.

Secondly, we have no best parameters' setting, because the best parameters setting groups are decided by the search strategy.

Thirdly, the purpose of our benckmarks in the paper is to prove that in a fair situation, that is, all methods use a same group of splited dataset, our method is better than the previous method. We never meant that we can have the best performance on all situations.

Last but not least, we only seleted the splits that the distrbutions of train-valid-test sets is closed. Different group of splited datasets will lead to different benchmark scores, so if you can't reach the score in our paper in one run, please test a few more groups of splitted dataset. Then you can find some groups can reach and exceed the benchmarks scores in our paper.

Thanks for your attention to our paper and code.