Open yan9qu opened 2 years ago
Hello, can you elaborate the error log? I'm guessing the error was raised because some of your model's output contains Nan. Maybe reduce learning rate to make training stable.
Have a nice day! I used a multi-task learning method to do backward twice and got this error. And I tried to adjust LR to 0.0000002, but it got the same result. Any further suggestions? Thank you again! I added this method, https://github.com/wgchang/PCGrad-pytorch-example/blob/master/pcgrad-example.py
Thanks, I think the error raised in https://github.com/ozmig77/dcnet/blob/main/model/metric.py#L65 The other reason for NaN might be division by zero. But I don't think thats the case.
You're right. And I also find that before idx6 it runs good and at idx6, all outputs containing trgembd, srcembd all become NaN......
I'm not sure what idx6 is, but I think you can start debugging from there.
Nice work! When I changed some code of your work, I met "raise ValueError("Found nans in similarity matrix!")". How can I solve it? Thank you!