microsoft / ProDA

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)
https://arxiv.org/abs/2101.10979
MIT License
286 stars 44 forks source link

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #49

Open yuheyuan opened 2 years ago

yuheyuan commented 2 years ago

when I run tran.py

 python train.py --name gta2citylabv2_stage1Denoise --used_save_pseudo --ema --proto_rectify --moving_prototype --path_soft Pseudo/gta2citylabv2_warmup_soft --resume_path ./pretrained/gta2citylabv2_warmup/from_gta5_to_cityscapes_on_deeplabv2_best_model.pkl --proto_consistW 10 --rce --regular_w 0.1
Traceback (most recent call last):
  File "train2.py", line 217, in <module>
    train(opt, logger)
  File "train2.py", line 88, in train
    target_lpsoft, target_image_full, target_weak_params)
  File "/media/ailab/data/yy/ProDA/models/adaptation_modelv2.py", line 209, in step
    weights = self.get_prototype_weight(ema_out['feat'], target_weak_params=target_weak_params)
  File "/media/ailab/data/yy/ProDA/models/adaptation_modelv2.py", line 350, in get_prototype_weight
    feat_proto_distance = self.feat_prototype_distance(feat)
  File "/media/ailab/data/yy/ProDA/models/adaptation_modelv2.py", line 345, in feat_prototype_distance
    feat_proto_distance[:, i, :, :] = torch.norm(self.objective_vectors[i].reshape(-1,1,1).expand(-1, H, W) - feat, 2, dim=1,)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

I only want to use two gpu, So I change the code only in here

 if opt.model_name == 'deeplabv2':
        model = adaptation_modelv2.CustomModel(opt, logger)
        model = torch.nn.DataParallel(model, device_ids=device_ids)
yuheyuan commented 2 years ago

    def feat_prototype_distance(self, feat):
        print("enterehre!")
        N, C, H, W = feat.shape
        feat_proto_distance = -torch.ones((N, self.class_numbers, H, W)).to(feat.device)
        for i in range(self.class_numbers):
            #feat_proto_distance[:, i, :, :] = torch.norm(torch.Tensor(self.objective_vectors[i]).reshape(-1,1,1).expand(-1, H, W).to(feat.device) - feat, 2, dim=1,)
            print(feat)
            print(self.objective_vectors[i].to(feat.device))
            # print(self.objective_vectors[i])
            feat_proto_distance[:, i, :, :] = torch.norm(self.objective_vectors[i].reshape(-1,1,1).expand(-1, H, W) - feat, 2, dim=1,)  # here feat.toGPU, but self.objective_vecotrs belong to cpu

        return feat_proto_distance

I find it may be occur torch.norm here,one belong to GPU, one belong to CPU