Open ghost opened 3 years ago
In the training, I got this wrong as follows.
tensor([[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], device='cuda:0', grad_fn=) The tensor is the feature of source obtained by the model.
Traceback (most recent call last): File "train_rsd.py", line 212, in rsd_loss = RSD(feature_s,feature_t) File "train_rsd.py", line 133, in RSD u_s, s_s, v_s = torch.svd(Feature_s.t()) RuntimeError: svd_cuda: For batch 0: U(37,37) is zero, singular U.
Do you solve this problem?
In the training, I got this wrong as follows. tensor([[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], device='cuda:0', grad_fn=) The tensor is the feature of source obtained by the model. Traceback (most recent call last): File "train_rsd.py", line 212, in rsd_loss = RSD(feature_s,feature_t) File "train_rsd.py", line 133, in RSD u_s, s_s, v_s = torch.svd(Feature_s.t()) RuntimeError: svd_cuda: For batch 0: U(37,37) is zero, singular U.
Do you solve this problem?
I think it is caused by the unstable gradient. But not sure how to avoid it. https://pytorch.org/docs/stable/generated/torch.linalg.svd.html#torch.linalg.svd
In the training, I got this wrong as follows.
tensor([[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], device='cuda:0', grad_fn=)
The tensor is the feature of source obtained by the model.
Traceback (most recent call last): File "train_rsd.py", line 212, in
rsd_loss = RSD(feature_s,feature_t)
File "train_rsd.py", line 133, in RSD
u_s, s_s, v_s = torch.svd(Feature_s.t())
RuntimeError: svd_cuda: For batch 0: U(37,37) is zero, singular U.