Closed shwangtangjun closed 1 year ago
Hi, thank you for carefully checking our discussion results and proposing the issue. I will check the history records and re-run the codes for double check.
Hi. Any progress?
The results of the Fig. 2(b) are wrong and actually, it is based on another model implementation. This model version requires stacking deep layers for performing well, which is slow and redundant, although it is insensisitve to deep model depth. But this is not what we finally used for the evaluation and comparison.
The correct version (i.e., the provided codes in this repo) can perform well using shallow layers (e.g., 8 on Cora, 4 on Citeseer/Pubmed), as shown by our experiments. In this way, one does not need to use deep layers since the shallow model can already perform superiorly. And, if one still needs to consider deep layers, then using small step size \alpha can be used to alleviate the sensivitity to model depth (e.g., setting \alpha=0.1 or smaller)
We will run more experiments and update the figure in the paper soon. Sorry for causing the confusion.
Ok. Looking forward to seeing the revised paper.
I have updated the Arxiv paper with new fig. 2. In this experiment, we do not tune other hyper-parameters and only change the step size (--alpha) and the model depth (--num_layers) for obtaining the results in the figure.
Thanks again for pointing out this issue.
Thanks! I've checked the updated results. Nice work.
Could you provide the specific parameter settings for reproducing the results in Fig.2(b), when the model depth is large? I had problem even when the model depth is 16, i.e. --num_layers 16.
python main.py --dataset cora --method difformer --rand_split_class --lr 0.001 --weight_decay 0.01 --dropout 0.2 --num_layers 16 --hidden_channels 64 --num_heads 1 --kernel simple --use_graph --use_bn --use_residual --alpha 0.5 --runs 1 --epochs 500 --seed 123 --device 0
The output accuracy is 29.40%, and is achieved on the 8th epoch. I have tried tuning weight_decay, dropout, but nothing helps.