vgsatorras / egnn

MIT License
420 stars 75 forks source link

About training egnn on qm9 dataset #4

Closed ChenhLiwnl closed 2 years ago

ChenhLiwnl commented 2 years ago

Hello! I'm really interested in your work, and tested it on qm9 dataset following your setting. However, I found a strange situation that when training about 80 epochs, the loss suddenly burst from about 0.05 or lower to 1 and never drop again. This usually happenes when I'm training on gap/lumo/homo property. Have you ever encountered this situation? If so do you know how to deal with it? Thank you a lot!

Edit: I think it's because CosAnealing lr scheduler, but i'm not sure update: tried other kind of scheduler but still failed

vgsatorras commented 2 years ago

Hi,

We have never experienced this issue with the provided commands in the Readme.md.

python -u main_qm9.py --num_workers 2 --lr 1e-3 --property homo --exp_name exp_1_homo

However, this thing you describe may happen for larger learning rates. Notice, the specific properties you mention (gap, lumo and homo) are using lower learning rates than the other properties. Is it possible that you are training it with larger learning rates instead of the ones provided in the readme?

Best, Victor

ChenhLiwnl commented 2 years ago

Well actually the aforementioned 3 attributes use larger learning rate? (not lower as you mentioned) I will try to reduce learning rate and report the results soon. Thanks for your advice!

vgsatorras commented 2 years ago

I have just re-run the QM9 experiment for the homo property multiple times using the commands in the readme and I did not have any divergence issues. Did you have this problem when running the code as it is in the repository with the provided commands (hyperparameters) or did you modify the code before running into this problem?

In case you didn't modify anything, may I ask what Pytorch version are you using?

Thank you, Victor

ChenhLiwnl commented 2 years ago

I have just re-run the QM9 experiment for the homo property multiple times using the commands in the readme and I did not have any divergence issues. Did you have this problem when running the code as it is in the repository with the provided commands (hyperparameters) or did you modify the code before running into this problem?

In case you didn't modify anything, may I ask what Pytorch version are you using?

Thank you, Victor

I used lower learning rate and that situation didn't happen again, so I think that may be the reason I added layer number (9 instead of 7) so I think that may cause the problem. Pytorch version is 1.9.0

vgsatorras commented 2 years ago

Thank you for the answers. Closing issue.