Open yueliu1999 opened 11 months ago
Thanks for your attention. In our paper, we set the epoch to 400, and we observed that the learnable trade-off alpha reduced to around 0.4, as shown in Figure 4 of the Appendix.
In your experiment, you set the epoch to 1000, and interestingly, the learnable trade-off alpha reduced to -1.4. This phenomenon could be attributed to overfitting. It is worth exploring the reasons behind this and finding potential solutions. If you plan to train the networks for 1000 epochs, you may consider tuning the initial value of the trade-off alpha or adjusting the learning rate.
Certainly, we can suggest some strategies to control the trade-off parameter. One approach is to make the parameter trainable initially and then make it untrainable after a certain number of epochs. This can be achieved by implementing a gradual freezing mechanism, where the trade-off parameter starts as a trainable variable and gradually transitions to an untrainable state. By doing so, you can allow the model to learn an optimal trade-off during the initial training phase and then fix it to ensure stability and prevent overfitting. Experimenting with different freezing strategies and monitoring their impact on the model's performance would be valuable for finding the most effective approach.
Thanks for your reply. Maybe this dramatic phenomenon can be attributed to overfitting, but the trade-off alpha keep decreasing from the fisrt epoch to the last epoch, we don't know where to stop.(it doesn't seem to converge considering the loss, metrics, alpha etc)
Does the alpha could be regarded as a weight of attribute and structure? If so, the dense attribute information keep down weigthing while sparse structure information keep up weighting could be interesting.
Yes. Alpha is the weight of attribute and structure. But I think the zero weight is the minimum value and the negative weight might denotes a new linear combination.
But in my experiments of cora dataset with default parameters, it reduce to -1.4 in 1000 epochs, which also occurs in other dataset and parameters, is it resonable?
Originally posted by @DrunkMe in https://github.com/yueliu1999/HSAN/issues/1#issuecomment-1837414116