microsoft / Graphormer

Graphormer is a general-purpose deep learning backbone for molecular modeling.
MIT License
2.14k stars 337 forks source link

About the resullt in ogbg-molhiv #90

Open dongZheX opened 2 years ago

dongZheX commented 2 years ago

``Thanks for the code. Good job!

At present, I am trying do some work based on Graphormer. And I try to reproduce the result in ogbg-molhiv, but meet some problems. I train the model in 2 x RTX 3090(24G), CUDA_VERSION:11.1, and the version of pytorch is same as the github project.

I train the model use this script

n_gpu=2
epoch=8
max_epoch=$((epoch + 1))
batch_size=64
tot_updates=$((33000*epoch/batch_size/n_gpu))
warmup_updates=$((tot_updates/10))

CUDA_VISIBLE_DEVICES=1,2 fairseq-train \
--user-dir graphormer \
--num-workers 16 \
--ddp-backend=legacy_ddp \
--dataset-name ogbg-molhiv \
--dataset-source ogb \
--task graph_prediction_with_flag \
--criterion binary_logloss_with_flag \
--arch graphormer_base \
--num-classes 1 \
--attention-dropout 0.1 --act-dropout 0.1 --dropout 0.0 \
--optimizer adam --adam-betas '(0.9, 0.999)' --adam-eps 1e-8 --clip-norm 5.0 --weight-decay 0.0 \
--lr-scheduler polynomial_decay --power 1 --warmup-updates $warmup_updates --total-num-update $tot_updates \
--lr 2e-4 --end-learning-rate 1e-9 \
--batch-size $batch_size \
--fp16 \
--data-buffer-size 20 \
--encoder-layers 12 \
--encoder-embed-dim 768 \
--encoder-ffn-embed-dim 768 \
--encoder-attention-heads 32 \
--max-epoch $max_epoch \
--save-dir $save_dir_root \
--pretrained-model-name pcqm4mv1_graphormer_base \
--seed ${seeds[$i]} \
--flag-m 3 \
--flag-step-size 0.001 \
--flag-mag 0.001 \
--tensorboard-logdir $tensorboard_dir_root \
--log-format simple --log-interval 100 \
--log-file $log_dir 

And evalute the model use:

CUDA_VISIBLE_DEVICES=3 python graphormer/evaluate/evaluate.py \
    --user-dir graphormer \
    --num-workers 16 \
    --ddp-backend=legacy_ddp \
    --dataset-name ogbg-molhiv \
    --dataset-source ogb \
    --task graph_prediction \
    --arch graphormer_base \
    --num-classes 1 \
    --batch-size $batch_size \
    --save-dir $save_dir_root  \
    --metric auc \
    --seed ${seeds[$i]} \
    --sfilename $result_dir \
    --log-format simple   

I use seeds 1-5 util now. And the result is: {'epoch-best': {'val': {'auc': 0.7915973390450101}, 'test': {'auc': 0.7689351341951192}}}(seed-1) {'epoch-best': {'val': {'auc': 0.7967697158563377}, 'test': {'auc': 0.7800533302417252}}}(seed-2) {'epoch-best': {'val': {'auc': 0.7556909933843831}, 'test': {'auc': 0.7775153131219446}}}(seed-3) {'epoch-best': {'val': {'auc': 0.7953004299078593}, 'test': {'auc': 0.799790350317856}}}(seed-4) {'epoch-best': {'val': {'auc': 0.7998829473968052}, 'test': {'auc': 0.7942418796977954}}}(seed-5)

And the results with pretrain model pcqm4mv2_graphormer_base are also not optimistic. emmm, I don't know what happens. Looking forward to your reply.

shiyu1994 commented 2 years ago

@dongZheX Thanks for using Graphormer. Could you please try to set CUDA_VISIBLE_DEVICES=0,1,2,3 in your script, without changing anything else? I found that there's a mistake in the hiv_pre.sh script for setting the visible GPUs and actually 4 GPUs are used. I've tried this and can reproduce the result.

With 2 GPUs, we may need to adjust the warmup steps and power of polynomial learning rate scheduler. I'll come back when I find the correct setting for 2 GPUs.

shiyu1994 commented 2 years ago

@dongZheX Could you please try this setting for 2 GPUs (doubling the batch size per GPU to 128, and the epoch to 16, without changing anything else)?

#!/usr/bin/env bash
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

n_gpu=2
epoch=16
max_epoch=$((epoch + 1))
batch_size=128
tot_updates=$((33000*epoch/batch_size/n_gpu))
warmup_updates=$((tot_updates/10))

CUDA_VISIBLE_DEVICES=0,1 fairseq-train \
--user-dir ../../graphormer \
--num-workers 16 \
--ddp-backend=legacy_ddp \
--dataset-name ogbg-molhiv \
--dataset-source ogb \
--task graph_prediction_with_flag \
--criterion binary_logloss_with_flag \
--arch graphormer_base \
--num-classes 1 \
--attention-dropout 0.1 --act-dropout 0.1 --dropout 0.0 \
--optimizer adam --adam-betas '(0.9, 0.999)' --adam-eps 1e-8 --clip-norm 5.0 --weight-decay 0.0 \
--lr-scheduler polynomial_decay --power 1 --warmup-updates $warmup_updates --total-num-update $tot_updates \
--lr 2e-4 --end-learning-rate 1e-9 \
--batch-size $batch_size \
--fp16 \
--data-buffer-size 20 \
--encoder-layers 12 \
--encoder-embed-dim 768 \
--encoder-ffn-embed-dim 768 \
--encoder-attention-heads 32 \
--max-epoch $max_epoch \
--save-dir ./ckpts \
--pretrained-model-name pcqm4mv1_graphormer_base \
--seed 1 \
--flag-m 3 \
--flag-step-size 0.001 \
--flag-mag 0.001

I got an AUC of 0.818716 on test set, with the optimal epoch on valid set (AUC 0.824427 on valid set).

shiyu1994 commented 2 years ago

Yet another setting following the warmup ratio in the original paper of Graphormer (0.06) without changing anything else, gets AUC 0.816581 on test set, with the optimal epoch on valid set (AUC 0.805187 on valid set).

#!/usr/bin/env bash
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

n_gpu=2
epoch=8
max_epoch=$((epoch + 1))
batch_size=64
tot_updates=$((33000*epoch/batch_size/n_gpu))
warmup_updates=$((tot_updates * 6 / 100))

CUDA_VISIBLE_DEVICES=0,1 fairseq-train \
--user-dir ../../graphormer \
--num-workers 16 \
--ddp-backend=legacy_ddp \
--dataset-name ogbg-molhiv \
--dataset-source ogb \
--task graph_prediction_with_flag \
--criterion binary_logloss_with_flag \
--arch graphormer_base \
--num-classes 1 \
--attention-dropout 0.1 --act-dropout 0.1 --dropout 0.0 \
--optimizer adam --adam-betas '(0.9, 0.999)' --adam-eps 1e-8 --clip-norm 5.0 --weight-decay 0.0 \
--lr-scheduler polynomial_decay --power 1 --warmup-updates $warmup_updates --total-num-update $tot_updates \
--lr 2e-4 --end-learning-rate 1e-9 \
--batch-size $batch_size \
--fp16 \
--data-buffer-size 20 \
--encoder-layers 12 \
--encoder-embed-dim 768 \
--encoder-ffn-embed-dim 768 \
--encoder-attention-heads 32 \
--max-epoch $max_epoch \
--save-dir ./ckpts_$1_$2_$3_$4_$5_2gpu_2 \
--pretrained-model-name pcqm4mv1_graphormer_base \
--seed 1 \
--flag-m 3 \
--flag-step-size 0.001 \
--flag-mag 0.001
dongZheX commented 2 years ago

Hi, without any changing, I change the CUDA_VISIBLE_DEVICES to 0,1,2,3

With seed 1 (use pcqm4mv1_graphormer_base), the result is:

{'epoch-best': {'val': {'auc': 0.8268111745401395}, 'test': {'auc': 0.8071068343864125}}}

With seed 2, the result is: {'epoch-best': {'val': {'auc': 0.8010994364928343}, 'test': {'auc': 0.7638590999555581}}}

The result is also wrong, the full log is here: {'epoch-1': {'val': {'auc': 0.6367309230302529}, 'test': {'auc': 0.7048402991130949}}}

{'epoch-2': {'val': {'auc': 0.6810684267456005}, 'test': {'auc': 0.724410179120051}}} {'epoch-3': {'val': {'auc': 0.7854260316409731}, 'test': {'auc': 0.7728344250574847}}} {'epoch-4': {'val': {'auc': 0.7775173204146482}, 'test': {'auc': 0.789851796031148}}} {'epoch-5': {'val': {'auc': 0.7927617366684133}, 'test': {'auc': 0.7691225629432109}}} {'epoch-6': {'val': {'auc': 0.7645725894671042}, 'test': {'auc': 0.7639537804571715}}} {'epoch-7': {'val': {'auc': 0.8010994364928343}, 'test': {'auc': 0.7638590999555581}}} {'epoch-8': {'val': {'auc': 0.7837805539468484}, 'test': {'auc': 0.7722566807721292}}} {'epoch-9': {'val': {'auc': 0.7525578445161468}, 'test': {'auc': 0.7359650648271598}}} {'epoch-best': {'val': {'auc': 0.8010994364928343}, 'test': {'auc': 0.7638590999555581}}} {'epoch-last': {'val': {'auc': 0.7525578445161468}, 'test': {'auc': 0.7359650648271598}}}

I'll try to change the epoch and batch_size.

On 2/27/2022 @.***> wrote:

@dongZheX Thanks for using Graphormer. Could you please try to set CUDA_VISIBLE_DEVICES=0,1,2,3 in your script, without changing anything else? I found that there's a mistake in the hiv_pre.sh script for setting the visible GPUs and actually 4 GPUs are used. I've tried this and can reproduce the result.

With 2 GPUs, we may need to adjust the warmup steps and power of polynomial learning rate scheduler. I'll come back when I find the correct setting for 2 GPUs.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>

dongZheX commented 2 years ago

with scripts:

n_gpu=2
epoch=8
max_epoch=$((epoch + 1))
batch_size=64
tot_updates=$((33000*epoch/batch_size/n_gpu))
warmup_updates=$((tot_updates * 6 / 100)) 
CUDA_VISIBLE_DEVICES=6,7 fairseq-train \
--user-dir graphormer \
--num-workers 16 \
--ddp-backend=legacy_ddp \
--dataset-name ogbg-molhiv \
--dataset-source ogb \
--task graph_prediction_with_flag \
--criterion binary_logloss_with_flag \
--arch graphormer_base \
--num-classes 1 \
--attention-dropout 0.1 --act-dropout 0.1 --dropout 0.0 \
--optimizer adam --adam-betas '(0.9, 0.999)' --adam-eps 1e-8 --clip-norm 5.0 --weight-decay 0.0 \
--lr-scheduler polynomial_decay --power 1 --warmup-updates $warmup_updates --total-num-update $tot_updates \
--lr 2e-4 --end-learning-rate 1e-9 \
--batch-size $batch_size \
--fp16 \
--data-buffer-size 20 \
--encoder-layers 12 \
--encoder-embed-dim 768 \
--encoder-ffn-embed-dim 768 \
--encoder-attention-heads 32 \
--max-epoch $max_epoch \
--save-dir $save_dir_root \
--pretrained-model-name pcqm4mv1_graphormer_base \
--seed 1 \
--flag-m 3 \
--flag-step-size 0.001 \
--flag-mag 0.001 \
--tensorboard-logdir $tensorboard_dir_root \
--log-format simple --log-interval 100 \
--log-file $log_dir 

I got the result:

{'epoch-1': {'val': {'auc': 0.6695133124354602}, 'test': {'auc': 0.7005593878615732}}}
{'epoch-2': {'val': {'auc': 0.6913963272447594}, 'test': {'auc': 0.7289896237899252}}}
{'epoch-3': {'val': {'auc': 0.7223340656781544}, 'test': {'auc': 0.7466407744478581}}}
{'epoch-4': {'val': {'auc': 0.49609926796159937}, 'test': {'auc': 0.49780109365640635}}}
{'epoch-5': {'val': {'auc': 0.535256734354939}, 'test': {'auc': 0.4508347342183062}}}
{'epoch-6': {'val': {'auc': 0.4679928542756374}, 'test': {'auc': 0.4747376963654281}}}
{'epoch-7': {'val': {'auc': 0.5192539275438257}, 'test': {'auc': 0.49205360075744403}}}
{'epoch-8': {'val': {'auc': 0.4971748036611112}, 'test': {'auc': 0.5593279616640581}}}
{'epoch-9': {'val': {'auc': 0.5475518539967947}, 'test': {'auc': 0.5567580623345506}}}
{'epoch-best': {'val': {'auc': 0.535256734354939}, 'test': {'auc': 0.4508347342183062}}}
{'epoch-last': {'val': {'auc': 0.5475518539967947}, 'test': {'auc': 0.5567580623345506}}}

It is strange.

shiyu1994 commented 2 years ago

@dongZheX Could you please provide your log when fine-tuning the model? We can check the loss of valid and training set when fine-tuning. Thanks.

dongZheX commented 2 years ago

My log is in the attached file. The "hiv_base_warmup006" represents that we set the warmup_updates=$((tot_updates * 6 / 100)) . The "hiv_base_v4" represents that we set the n_gpu=4 and use 4 x 3090. to train the model. The log file is in the "logs" directory and the results is in the "result" directory. exp.zip

shiyu1994 commented 2 years ago

@dongZheX Thanks. It seems that the attached files cannot be downloaded. Could you please double check?

dongZheX commented 2 years ago

@dongZheX Thanks. It seems that the attached files cannot be downloaded. Could you please double check?

I put the files in my repository: https://github.com/dongZheX/myexp

shiyu1994 commented 2 years ago

@dongZheX Thanks for reporting the issue. After checking, we found that we used the wrong checkpoint for finetuning MolHIV. #96 is opened to fix this. And following lists the AUC of valid and test sets on MolHIV with 10 seeds, with an average AUC of 0.805 on test.

seed 0, valid best 0.82628413, test 0.81082449
seed 1, valid best 0.81708539, test 0.80874539
seed 2, valid best 0.81538782, test 0.81652658
seed 3, valid best 0.8399321, test 0.80314764
seed 4, valid best 0.79637443, test 0.79757309 
seed 5, valid best 0.8203549,  test 0.82088575
seed 6, valid best 0.79118367, test 0.798203  
seed 7, valid best 0.82911852, test 0.80339304
seed 8, valid best 0.81123889, test 0.80003865
seed 9, valid best 0.79634073, test 0.7965915
dongZheX commented 2 years ago

@dongZheX Thanks for reporting the issue. After checking, we found that we used the wrong checkpoint for finetuning MolHIV. #96 is opened to fix this. And following lists the AUC of valid and test sets on MolHIV with 10 seeds, with an average AUC of 0.805 on test.

seed 0, valid best 0.82628413, test 0.81082449
seed 1, valid best 0.81708539, test 0.80874539
seed 2, valid best 0.81538782, test 0.81652658
seed 3, valid best 0.8399321, test 0.80314764
seed 4, valid best 0.79637443, test 0.79757309 
seed 5, valid best 0.8203549,  test 0.82088575
seed 6, valid best 0.79118367, test 0.798203  
seed 7, valid best 0.82911852, test 0.80339304
seed 8, valid best 0.81123889, test 0.80003865
seed 9, valid best 0.79634073, test 0.7965915

Thanks, let's me try again.

dongZheX commented 2 years ago

@dongZheX Thanks for reporting the issue. After checking, we found that we used the wrong checkpoint for finetuning MolHIV. #96 is opened to fix this. And following lists the AUC of valid and test sets on MolHIV with 10 seeds, with an average AUC of 0.805 on test.

seed 0, valid best 0.82628413, test 0.81082449
seed 1, valid best 0.81708539, test 0.80874539
seed 2, valid best 0.81538782, test 0.81652658
seed 3, valid best 0.8399321, test 0.80314764
seed 4, valid best 0.79637443, test 0.79757309 
seed 5, valid best 0.8203549,  test 0.82088575
seed 6, valid best 0.79118367, test 0.798203  
seed 7, valid best 0.82911852, test 0.80339304
seed 8, valid best 0.81123889, test 0.80003865
seed 9, valid best 0.79634073, test 0.7965915

Thanks. It works now. By the way, If I want to train hiv with more gpus, how I change the settings of tot_updates, warmup_updates, batch_size and epoch? if I want to reproduce the result of pcba, can I use the pretrain model pcqm4mv1_graphormer_base directly(Is there any possible to offer the script of training pcba.)? If I want to repretrain the pcqm4mv1 for hiv, Is ok to just add to --pre-layernorm to the pcqv1.sh? Thanks again.