train on dt4d - Githubissues

xieyizheng / hybridfmaps

CVPR 2024, Hybrid Functional Maps for Crease-Aware Non-Isometric Shape Matching

https://hybridfmaps.github.io

MIT License

7 stars 3 forks source link

train on dt4d #1

Open Native537 opened 3 months ago

Native537 commented 3 months ago

Hi,

  I always have the phenomenon that the loss is always NAN when I train on the dt4d dataset, I have adjusted the learning rate and it doesn't help, what should I do to fix it.

Best wish.

xieyizheng commented 3 months ago

Hi Native537, Thank you for raising this issue. We have also encountered this phenomenon in our experiments. It likely indicates a bug in the current release of the code. We will investigate this and release a patch as soon as possible. Best regards

xieyizheng commented 3 months ago

Hi Native537, as we're revisiting the code, we found some other source of training instability and will try to improve that, but haven't met the nan issue yet. at the same time would you post as much your steps of reproduction and the training logs, those information could be greatly helpful and appreciated!

Native537 commented 3 months ago

Hi Xie，

Firstly, my environment is as follows, torch=1.12.1+cu113 torchvision=0.13.1, torchaudio=0.12.1.While training on dt4d，at the first epoch, iteration is 100 ，there is a loss of NAN,and no matter how many adjustments are made, it's the same thing every time. I'm very sorry, the training logs was cleaned up by me, I'll get back to reproducing it and making it available as soon as I can. Lastly, I would like to ask if there is any possibility that it is related to my environment. Best regards

xieyizheng commented 3 months ago

Hi thanks for the provided info! That's very helpful. Yes, the environment could be an issue for incompatibilities, you could give it a try the one we've provided. In general, it's also recommended to try out a few different random seeds even with the same configs to "outrun" randomness, it would get slightly different result each time and sometimes can get away with tiny issues just by luck. To disclose some extra context for the NaN situation here, one of the sources we've met comes from the numerical instability of the elastic basis, we should already have some implementation in-place to handle those, for us we're simply replacing those entries with nearest non-nan values. But there could also be some other sources we haven't uncovered yet.