ozmig77 / dcnet

16 stars 8 forks source link

How can I solve this issue? #6

Open yan9qu opened 2 years ago

yan9qu commented 2 years ago

Nice work! When I changed some code of your work, I met "raise ValueError("Found nans in similarity matrix!")". How can I solve it? Thank you!

ozmig77 commented 2 years ago

Hello, can you elaborate the error log? I'm guessing the error was raised because some of your model's output contains Nan. Maybe reduce learning rate to make training stable.

yan9qu commented 2 years ago

Have a nice day! I used a multi-task learning method to do backward twice and got this error. And I tried to adjust LR to 0.0000002, but it got the same result. Any further suggestions? Thank you again! I added this method, https://github.com/wgchang/PCGrad-pytorch-example/blob/master/pcgrad-example.py

ozmig77 commented 2 years ago

Thanks, I think the error raised in https://github.com/ozmig77/dcnet/blob/main/model/metric.py#L65 The other reason for NaN might be division by zero. But I don't think thats the case.

yan9qu commented 2 years ago

You're right. And I also find that before idx6 it runs good and at idx6, all outputs containing trgembd, srcembd all become NaN......

ozmig77 commented 2 years ago

I'm not sure what idx6 is, but I think you can start debugging from there.