train on smal - Githubissues

xieyizheng / hybridfmaps

CVPR 2024, Hybrid Functional Maps for Crease-Aware Non-Isometric Shape Matching

https://hybridfmaps.github.io

MIT License

7 stars 3 forks source link

train on smal #2

Open zzifer opened 2 weeks ago

zzifer commented 2 weeks ago

Hello, I am very interested in your paper. However, I found that the loss of l_elas_orth is always 0 when I train on smal. Is this normal? And after reviewing, I found that the result of testing on smal is 5.03, which is different from the original 3.3. Is there any setting I didn't set correctly?

xieyizheng commented 2 weeks ago

Hi thanks for your interest in our work. l_elas_orth is set to 0 correctly as one of our hyperparamter choices.

Have you tested the result with one of our provided configs? options/hybrid_ulrssm/test/smal.yaml , should replace to your ckpt.

In line with previous works, the test conifg use test-time-adaptations refinements.

https://arxiv.org/pdf/2312.03678 at Fig. 15 and Fig 17 in suppl, we included results from 5 runs initialized with different random seeds. The best run was reported in line with previous works.

Hope this helps!

zzifer commented 2 weeks ago

Yes, I have replaced options/hybrid_ulrssm/test/smal.yaml with my ckpt. I will test it again, thanks for your reply.

zzifer commented 2 weeks ago

When I run training in dt4d, the following error is displayed. torch._C._LinAlgError: linalg.inv: (Batch element 0): The diagonal element 2 is zero, the inversion could not be completed because the input matrix is singular. This error occurs in the line C = torch.bmm(torch.inverse(A_A_t + self.lmbda * D_i), B_A_t[:, [i], :].transpose(1, 2)) of the compute_functional_map method. Have you ever encountered this problem?

xieyizheng commented 2 weeks ago

Thanks for reporting this issue. We haven't encountered this bug before, but we'd appreciate it if you could share your findings here. It might be environment-related, though we'll need to reproduce the error to confirm. This could also be related to a similar issue: https://github.com/xieyizheng/hybridfmaps/issues/1. We'll look into it when possible, and any additional details would be helpful for investigating further.

zzifer commented 2 weeks ago

his is the complete error screenshot. I can only locate the error at C = torch.bmm(torch.inverse(A_A_t + self.lmbda * D_i), B_A_t[:, [i], :].transpose(1, 2)). I am training on 4090 and the training stops at iter4260. I have installed the environment in full accordance with the requirements in README.md. I will debug and share my findings if there is any progress.

zzifer commented 1 week ago

I found that when iter=4265, when the feature is input into diffusionnet, the x returned by the first linear layer will all become nan values. I haven't found the specific reason yet, I hope it will be helpful to you!