microsoft / Graphormer

Graphormer is a general-purpose deep learning backbone for molecular modeling.
MIT License
2.12k stars 336 forks source link

How to evaluate for dataset zinc? #85

Open dongZheX opened 2 years ago

dongZheX commented 2 years ago

Thanks for the code. Good job. I train graphormer_slim in zinc dataset by bash examples/property_prediction/zinc.sh. Then I try to evalute it by:

python graphormer/evaluate/evaluate.py \  
--user-dir graphormer \  
--num-workers 16 \  
--ddp-backend=legacy_ddp \  
--dataset-name zinc \  
--dataset-source pyg \
--task graph_prediction \  
--criterion l1_loss \  
--arch graphormer_slim \  
--num-classes 1 \  
--batch-size 64 \  
--save-dir exp/checkpoints_dir/ckpts_zinc \  
--metric mae \  
--split test      

An error happen:

TypeError: mean() received an invalid combination of arguments - got (out=NoneType, dtype=NoneType, axis=NoneType, ), but expected one of:
 * (*, torch.dtype dtype)
 * (tuple of ints dim, bool keepdim, *, torch.dtype dtype)
 * (tuple of names dim, bool keepdim, *, torch.dtype dtype)

I change the code mae = np.mean(np.abs(y_true-y_pred)) to mae = torch.nn.functional.l1_loss(y_true, y_pred)

But got the mae in zinc dataset is :

2022-01-10 14:40:42 | INFO | graphormer.tasks.graph_prediction | Loaded test with #samples: 5000
2022-01-10 14:40:46 | INFO | __main__ | mae: 0.06235151365399361 

Is this result normal? I doubt that I make some mistakes.

shiyu1994 commented 2 years ago

@dongZheX Thanks for using Graphormer. I'll fix the problem in the script.

Do you use the full ZINC dataset, or the subset as mentioned in Graphormer paper?

dongZheX commented 2 years ago

@dongZheX Thanks for using Graphormer. I'll fix the problem in the script.

Do you use the full ZINC dataset, or the subset as mentioned in Graphormer paper?

Thanks for replying very much. I think I use the subset of ZINC, the result of Graphormer in paper is 0.122±0.006. But the result i got is 0.0623, i think i make mistakes somewhere.

By the way, in benchmarking_gnns https://github.com/graphdeeplearning/benchmarking-gnns: we need to train the model in seeds {41,12,35,92}, do i have to train the graphormer fours times and evaluate them four times? is there any method to get the final result (mean result ± std) more effienctly. (same question for ogbg-molhiv)

skye95git commented 2 years ago

Thanks for the code. Good job. I train graphormer_slim in zinc dataset by bash examples/property_prediction/zinc.sh. Then I try to evalute it by:

python graphormer/evaluate/evaluate.py \  
--user-dir graphormer \  
--num-workers 16 \  
--ddp-backend=legacy_ddp \  
--dataset-name zinc \  
--dataset-source pyg \
--task graph_prediction \  
--criterion l1_loss \  
--arch graphormer_slim \  
--num-classes 1 \  
--batch-size 64 \  
--save-dir exp/checkpoints_dir/ckpts_zinc \  
--metric mae \  
--split test      

An error happen:

TypeError: mean() received an invalid combination of arguments - got (out=NoneType, dtype=NoneType, axis=NoneType, ), but expected one of:
 * (*, torch.dtype dtype)
 * (tuple of ints dim, bool keepdim, *, torch.dtype dtype)
 * (tuple of names dim, bool keepdim, *, torch.dtype dtype)

I change the code mae = np.mean(np.abs(y_true-y_pred)) to mae = torch.nn.functional.l1_loss(y_true, y_pred)

But got the mae in zinc dataset is :

2022-01-10 14:40:42 | INFO | graphormer.tasks.graph_prediction | Loaded test with #samples: 5000
2022-01-10 14:40:46 | INFO | __main__ | mae: 0.06235151365399361 

Is this result normal? I doubt that I make some mistakes.

Hi, how do you successfully run bash examples/property_prediction/zinc.sh? I ran this command for half an hour and got no response: image

skye95git commented 2 years ago

@dongZheX Did you download the data set in advance?

ZhuYun97 commented 2 years ago

@dongZheX Thanks for using Graphormer. I'll fix the problem in the script. Do you use the full ZINC dataset, or the subset as mentioned in Graphormer paper?

Thanks for replying very much. I think I use the subset of ZINC, the result of Graphormer in paper is 0.122±0.006. But the result i got is 0.0623, i think i make mistakes somewhere.

By the way, in benchmarking_gnns https://github.com/graphdeeplearning/benchmarking-gnns: we need to train the model in seeds {41,12,35,92}, do i have to train the graphormer fours times and evaluate them four times? is there any method to get the final result (mean result ± std) more effienctly. (same question for ogbg-molhiv)

@dongZheX How many epochs did you run. I use the zinc.sh file and run 10000 epochs(use ZINC subset dataset), but get undesirable result(the mae score is around 0.6) which leaves a large margin to the reported result.

czczup commented 2 years ago

I also reproduced the result of 0.06+, what is the reason for the inconsistency? I use the zinc.sh to train the model, and use the same test cmd as dongZheX used above.

2022-06-20 17:56:39 | INFO | __main__ | evaluating checkpoint file examples/property_prediction/ckpts/zinc_graphormer_slim/checkpoint50.pt
2022-06-20 17:56:40 | INFO | graphormer.models.graphormer | Namespace(no_progress_bar=False, log_interval=100, log_format=None, log_file=None, tens
orboard_logdir=None, wandb_project=None, azureml_logging=False, seed=1, cpu=False, tpu=False, bf16=False, memory_efficient_bf16=False, fp16=False,
memory_efficient_fp16=False, fp16_no_flatten_grads=False, fp16_init_scale=128, fp16_scale_window=None, fp16_scale_tolerance=0.0, on_cpu_convert_pre
cision=False, min_loss_scale=0.0001, threshold_loss_scale=None, amp=False, amp_batch_retries=2, amp_init_scale=128, amp_scale_window=None, user_dir
='graphormer', empty_cache_freq=0, all_gather_list_size=16384, model_parallel_size=1, quantization_config_path=None, profile=False, reset_logging=F
alse, suppress_crashes=False, use_plasma_view=False, plasma_path='/tmp/plasma', criterion='l1_loss', tokenizer=None, bpe=None, optimizer=None, lr_s
cheduler='fixed', scoring='bleu', task='graph_prediction', num_workers=16, skip_invalid_size_inputs_valid_test=False, max_tokens=None, batch_size=6
4, required_batch_size_multiple=8, required_seq_len_multiple=1, dataset_impl=None, data_buffer_size=10, train_subset='train', valid_subset='valid',
 combine_valid_subsets=None, ignore_unused_valid_subsets=False, validate_interval=1, validate_interval_updates=0, validate_after_updates=0, fixed_v
alidation_seed=None, disable_validation=False, max_tokens_valid=None, batch_size_valid=64, max_valid_steps=None, curriculum=0, gen_subset='test', n
um_shards=1, shard_id=0, grouped_shuffling=False, update_epoch_batch_itr=False, update_ordered_indices_seed=False, distributed_world_size=1, distri
buted_num_procs=1, distributed_rank=0, distributed_backend='nccl', distributed_init_method=None, distributed_port=-1, device_id=0, distributed_no_s
pawn=False, ddp_backend='legacy_ddp', ddp_comm_hook='none', bucket_cap_mb=25, fix_batches_to_gpus=False, find_unused_parameters=False, gradient_as_
bucket_view=False, fast_stat_sync=False, heartbeat_timeout=-1, broadcast_buffers=False, slowmo_momentum=None, slowmo_base_algorithm='localsgd', loc
alsgd_frequency=3, nprocs_per_node=1, pipeline_model_parallel=False, pipeline_balance=None, pipeline_devices=None, pipeline_chunks=0, pipeline_enco
der_balance=None, pipeline_encoder_devices=None, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_checkpoint='never', zero_sh
arding='none', no_reshard_after_forward=False, fp32_reduce_scatter=False, cpu_offload=False, use_sharded_state=False, not_fsdp_flatten_parameters=F
alse, arch='graphormer_slim', max_epoch=0, max_update=0, stop_time_hours=0, clip_norm=0.0, sentence_avg=False, update_freq=[1], lr=[0.25], stop_min
_lr=-1.0, use_bmuf=False, skip_remainder_batch=False, save_dir='examples/property_prediction/ckpts/zinc_graphormer_slim', restore_file='checkpoint_
last.pt', finetune_from_model=None, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, optimizer_override
s='{}', save_interval=1, save_interval_updates=0, keep_interval_updates=-1, keep_interval_updates_pattern=-1, keep_last_epochs=-1, keep_best_checkp
oints=-1, no_save=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_save_optimizer_state=False, best_checkpoint_metric='loss', maxim
ize_best_checkpoint_metric=False, patience=-1, checkpoint_suffix='', checkpoint_shard_count=1, load_checkpoint_on_all_dp_ranks=False, write_checkpo
ints_asynchronously=False, store_ema=False, ema_decay=0.9999, ema_start_update=0, ema_seed_model=None, ema_update_freq=1, ema_fp32=False, split='te
st', metric='mae', dataset_name='zinc', num_classes=1, max_nodes=128, dataset_source='pyg', num_atoms=4608, num_edges=1536, num_in_degree=512, num_
out_degree=512, num_spatial=512, num_edge_dis=128, multi_hop_max_dist=5, spatial_pos_max=1024, edge_type='multi_hop', pretrained_model_name='none',
 load_pretrained_model_output_layer=False, train_epoch_shuffle=False, user_data_dir='', force_anneal=None, lr_shrink=0.1, warmup_updates=0, pad=1,
eos=2, unk=3, no_seed_provided=False, encoder_embed_dim=80, encoder_layers=12, encoder_attention_heads=8, encoder_ffn_embed_dim=80, activation_fn='
gelu', encoder_normalize_before=True, apply_graphormer_init=True, share_encoder_input_output_embed=False, no_token_positional_embeddings=False, pre
_layernorm=False, dropout=0.1, attention_dropout=0.1, act_dropout=0.0, _name='graphormer_slim')
2022-06-20 17:56:41 | INFO | graphormer.tasks.graph_prediction | Loaded test with #samples: 5000
2022-06-20 17:56:44 | INFO | __main__ | mae: 0.0699552521109581
lsj2408 commented 2 years ago

https://github.com/pyg-team/pytorch_geometric/blob/97c50a03db9f5e9fbb0ab42d38681cac0d2a020a/torch_geometric/datasets/zinc.py#L64

Note that the ZINC dataset has both the full sets and the subset sets. The current version of our codes corresponds to the full sets. So there is a mismatch between your reproduced 0.069 mae with the results in our paper. You can specify it via this argument.

czczup commented 2 years ago

Note that the ZINC dataset has both the full sets and the subset sets. The current version of our codes corresponds to the full sets. So there is a mismatch between your reproduced 0.069 mae with the results in our paper. You can specify it via this argument.

Thanks for your reply~

JiaYuanChng commented 2 years ago

https://github.com/pyg-team/pytorch_geometric/blob/97c50a03db9f5e9fbb0ab42d38681cac0d2a020a/torch_geometric/datasets/zinc.py#L64

Note that the ZINC dataset has both the full sets and the subset sets. The current version of our codes corresponds to the full sets. So there is a mismatch between your reproduced 0.069 mae with the results in our paper. You can specify it via this argument.

Hello, just a quick question for clarification purposes - what is the modification required to be made in the zinc.sh file in order to train using the subset instead of the full set? Do I actually go to that zinc.py file and change the subset argument from False to True?