thuml / Domain-Adaptation-Regression

Code release for Representation Subspace Distance for Domain Adaptation Regression (ICML 2021)
75 stars 10 forks source link

error #2

Open ghost opened 3 years ago

ghost commented 3 years ago

In the training, I got this wrong as follows.

tensor([[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], device='cuda:0', grad_fn=) The tensor is the feature of source obtained by the model.

Traceback (most recent call last): File "train_rsd.py", line 212, in rsd_loss = RSD(feature_s,feature_t) File "train_rsd.py", line 133, in RSD u_s, s_s, v_s = torch.svd(Feature_s.t()) RuntimeError: svd_cuda: For batch 0: U(37,37) is zero, singular U.

ZhaoZhibin commented 2 years ago

In the training, I got this wrong as follows.

tensor([[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], device='cuda:0', grad_fn=) The tensor is the feature of source obtained by the model.

Traceback (most recent call last): File "train_rsd.py", line 212, in rsd_loss = RSD(feature_s,feature_t) File "train_rsd.py", line 133, in RSD u_s, s_s, v_s = torch.svd(Feature_s.t()) RuntimeError: svd_cuda: For batch 0: U(37,37) is zero, singular U.

Do you solve this problem?

xuxu116 commented 2 years ago

In the training, I got this wrong as follows. tensor([[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], device='cuda:0', grad_fn=) The tensor is the feature of source obtained by the model. Traceback (most recent call last): File "train_rsd.py", line 212, in rsd_loss = RSD(feature_s,feature_t) File "train_rsd.py", line 133, in RSD u_s, s_s, v_s = torch.svd(Feature_s.t()) RuntimeError: svd_cuda: For batch 0: U(37,37) is zero, singular U.

Do you solve this problem?

I think it is caused by the unstable gradient. But not sure how to avoid it. https://pytorch.org/docs/stable/generated/torch.linalg.svd.html#torch.linalg.svd