qitianwu / DIFFormer

The official implementation for ICLR23 spotlight paper "DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion"
302 stars 32 forks source link

How to reproduce the results in Fig.2(b), increasing model depth K? #6

Closed shwangtangjun closed 1 year ago

shwangtangjun commented 1 year ago

Could you provide the specific parameter settings for reproducing the results in Fig.2(b), when the model depth is large? I had problem even when the model depth is 16, i.e. --num_layers 16.

python main.py --dataset cora --method difformer --rand_split_class --lr 0.001 --weight_decay 0.01 --dropout 0.2 --num_layers 16 --hidden_channels 64 --num_heads 1 --kernel simple --use_graph --use_bn --use_residual --alpha 0.5 --runs 1 --epochs 500 --seed 123 --device 0

The output accuracy is 29.40%, and is achieved on the 8th epoch. I have tried tuning weight_decay, dropout, but nothing helps.

qitianwu commented 1 year ago

Hi, thank you for carefully checking our discussion results and proposing the issue. I will check the history records and re-run the codes for double check.

shwangtangjun commented 1 year ago

Hi. Any progress?

qitianwu commented 1 year ago

The results of the Fig. 2(b) are wrong and actually, it is based on another model implementation. This model version requires stacking deep layers for performing well, which is slow and redundant, although it is insensisitve to deep model depth. But this is not what we finally used for the evaluation and comparison.

The correct version (i.e., the provided codes in this repo) can perform well using shallow layers (e.g., 8 on Cora, 4 on Citeseer/Pubmed), as shown by our experiments. In this way, one does not need to use deep layers since the shallow model can already perform superiorly. And, if one still needs to consider deep layers, then using small step size \alpha can be used to alleviate the sensivitity to model depth (e.g., setting \alpha=0.1 or smaller)

We will run more experiments and update the figure in the paper soon. Sorry for causing the confusion.

shwangtangjun commented 1 year ago

Ok. Looking forward to seeing the revised paper.

qitianwu commented 1 year ago

I have updated the Arxiv paper with new fig. 2. In this experiment, we do not tune other hyper-parameters and only change the step size (--alpha) and the model depth (--num_layers) for obtaining the results in the figure.

Thanks again for pointing out this issue.

shwangtangjun commented 1 year ago

Thanks! I've checked the updated results. Nice work.