yuyangw / MolCLR

Implementation of MolCLR: "Molecular Contrastive Learning of Representations via Graph Neural Networks" in PyG.
MIT License
233 stars 57 forks source link

questions about re-producing #10

Closed Jiaran closed 2 years ago

Jiaran commented 2 years ago

Nice work, thanks for sharing ! But I have a question about reproducing the results in the paper. When I tried to reproduce BACE , the model has similar results than training from scatch. To double-check if I load the model correctly, I freeze the pretrain model and train BACE on it, it cannot converge. Given meaning representations, shouldn't freezing the pretrain model perform slightly worse than finetuning on it? Can you please provide more details on reproducing the paper's results?

Here's the config I used: batch_size: 32 # batch size epochs: 100 # total number of epochs eval_every_n_epochs: 1 # validation frequency fine_tune_from: ./ckpt/ # sub directory of pre-trained model in ./ckpt log_every_n_steps: 50 # print training log frequency fp16_precision: False # float precision 16 (i.e. True/False) init_lr: 0.0005 # initial learning rate for the prediction head init_base_lr: 0.00001 # initial learning rate for the base GNN encoder weight_decay: 1e-6 # weight decay of Adam gpu: cuda:0 # training GPU task_name: BACE # name of fine-tuning benchmark, inlcuding

classifications: BBBP/BACE/ClinTox/Tox21/HIV/SIDER/MUV

                            # regressions: FreeSolv/ESOL/Lipo/qm7/qm8/qm9

model_type: gin # GNN backbone (i.e., gin/gcn) model: num_layer: 5 # number of graph conv layers emb_dim: 300 # embedding dimension in graph conv layers feat_dim: 512 # output feature dimention drop_ratio: 0.5 # dropout ratio pool: mean # readout pooling (i.e., mean/max/add)

dataset: num_workers: 4 # dataloader number of workers valid_size: 0.1 # ratio of validation data test_size: 0.1 # ratio of test data splitting: scaffold # data splitting (i.e., random/scaffold)

yuyangw commented 2 years ago

Hi, thanks for your interest in our work.

Can you check the "fine_tune_from" parameter in the config file to make sure the pre-trained model is correctly loaded? If you are using the exact code in the repo, you will need to set "fine_tune_from: pretrained_gin" to load the GIN model.

Hope this helps.

Best, Yuyang

Jiaran commented 2 years ago

Hi, thanks for your interest in our work.

Can you check the "fine_tune_from" parameter in the config file to make sure the pre-trained model is correctly loaded? If you are using the exact code in the repo, you will need to set "fine_tune_from: pretrained_gin" to load the GIN model.

Hope this helps.

Best, Yuyang

Hi Yuyang, Thanks for your fast reply! Here is what I did. I clone the code. I finetune the model using the above config file( and fine_tune_from :pretrained_gin IS SET. Sorry about the confusion). The log says "load pretrained model with success". I use BACE dataset as I saw MOLCLR performs great on it. But I got auc about 0.73 but the paper reports 0.89.

Then I freeze the gnn part and only finetune the pre linear part. It seems it cannot converge. I'm wondering what I did wrong.

Thanks!

yuyangw commented 2 years ago

The init_base_lr seems too small, I would suggest you try 0.00005~0.0002. Also, other hyperparameters can also be tuned to reach better performance.

Besides, I haven't tried linear probing so I'm afraid I can't give more comments on that.

Hope this helps.

Best, Yuyang