the loss_train and loss_val does not change with epochs

VivianYWY commented 1 year ago

hello, i rerun this code using cora dataset and does not change anything, the result is that the loss_train and loss_val does not change with the epochs, always keeping a same value. i checked the output values of each layer and found that they are all very close to 0, which lead to a same value dealt with softmax when calculating loss. have you ever encountered this problem? i could not solve it. thanks a lot.

jamesYu365 commented 1 year ago

Hello, thank you for reaching out and reporting the issue you encountered while running the code. Upon receiving your issue, I reran the code without any change, and I can confirm that the code is working fine on my end. I kindly request you to download the code from the GitHub repository and run it again on your end. This will help us compare the results and identify any potential environment-specific issues.

As for your report on the output values of each layer being close to 0, I have thoroughly examined the output values for the first and last layers in the first epoch of my run, as shown below: The output of first layer $\uparrow$

The output of last layer $\uparrow$

I agree that the output of the first layer appears to be close to 0, but the relative differences are significant. On the other hand, the output of the last layer, just before the log_softmax operation, is not close to 0. These observations might indeed hold crucial clues in resolving the issue.

If none of the above steps resolve the issue, it is possible that the problem was related to the python environment in which the code was executed. I recommend configuring your Python environment as I provided in the README file to ensure consistency in the computing environment.

If you encounter any further problems or have additional questions, feel free to let me know. I'm here to assist you in getting the code to work as expected.

Best regards, James

VivianYWY commented 1 year ago

Hello, it is very kind of you for such patient reply. I reran the code with a little change actually: (1) one change is in the GraphAttentionLayer forward of layers.py ( i compare the original code from this github with the paper and found that there is something different): this is what i changed:

this is the corresponding original code part:

(2) another change is in the GAT forward of models.py ( i delete the comment for hidden layer and use them when running code): this is what i changed:

this is the corresponding original code part:

for the outputs of last layer, they are very close to 0 (pics below):

after softmax processing, it becomes the same value:

jamesYu365 commented 1 year ago

Hello, I'm glad we're making progress in understanding the root cause of the problem. Regarding the changes you made, the second one appears to be trivial, so no need to discuss that further. However, the first change you made in the code is the key factor leading to the issue you encountered.

I'm glad that you have found this code is different from the original paper. Actually, during my implementation, I had the same problem as you, the gradient did not descend. Thus I modified the model in the original paper to make the learning process doable while exploiting the edge features.

Because I haven't continued working on GNNs, I haven't further investigated this question. If you're interested in exploring this issue in more depth, I'd recommend reaching out to the author of the original paper for insights into their intended implementation.

I apologize for not mentioning this modification in the README file. Thank you for your reminder. If you need any further assistance or if there's anything else you'd like to discuss or clarify, please don't hesitate to reach out.

Best regards, James

jamesYu365 commented 11 months ago

Hello, It's been almost six months since I last heard from you regarding this issue. I appreciate your question and any input you provided. However, since there hasn't been any further activity or response, I assume that the issue may have been resolved or is no longer relevant.

As a result, I are closing this issue for now. If you encounter a similar problem in the future or have any new questions or concerns, please don't hesitate to open a new issue, and I'll be happy to assist you.

Best regards, James

jamesYu365 / EGAT

the loss_train and loss_val does not change with epochs #1