train from scratch on molecule datasets

microsoft / Graphormer

Graphormer is a general-purpose deep learning backbone for molecular modeling.

MIT License

2.08k stars 334 forks source link

train from scratch on molecule datasets #39

Open hehuanma opened 2 years ago

hehuanma commented 2 years ago

Hello, I am trying to use Graphormer on other commonly used datasets from MoleculeNet (https://moleculenet.org/datasets-1) to check the performance, such as BACE, BBBP, etc. I have used the default hparams in the script of molhiv, but the results are horrible... 1) May I know have you tried your model on these datasets without pretrained model? And do you have any suggestions on the hparams for these datasets if we want to train from scratch? I am trying to find out why the results are so bad... 2) For molhiv without pretrained model, I have tried with the provided script in the examples folder, with not adding the "checkpoint_path" argument, and train for 100 epochs. But the best val score is only around 0.763 and the corresponding test score is only 0.636... I don't know what goes wrong... May I know have you tried to use Graphormer directly on molhiv without pretrained model? How is the performance? Thank you.

zhengsx commented 2 years ago

Good question. While we have not tested Graphormer on MoleculeNet by training from scratch, the unsatisfactory performance is in expectation. Graphormer is built upon a standard Transformer model, which has very powerful expresiveness. This would be valuable for more challenging large-scale dataset, but will hurt the performance on small benchmark due to the crazy overfitting. Just imagine training ViT or SWin on MNIST or Cifar10 (although Transformer-based models already have been the de-facto standard on image processing).

If someone insist to get a good performance on those extremely small datasets such as MoleculeNet, e.g., less than 100K molecules, here is some tips which may be helpful:

Reduce the parameter size of Graphormer like what we do on ZINC.
Add strong regularization techniques like what we do on hiv and pcba.
Use pretrained model which is a very good method to overcome overfitting.

hehuanma commented 2 years ago

Thank you for the information! That makes sense, we did observe crazy overfitting for some datasets, and for others the training was quite unstable. Btw, do you plan to upload the pretrained model used in the paper? Thus we can apply it and save some computational costs. Thanks!

zhengsx commented 2 years ago

In our latest plan, all the pre-trained checkpoint models will be released together with the new efficient framework of Graphormer in the next release. Please stay tuned.

hehuanma commented 2 years ago

Sounds great! Thank you!