Open chenxyyy opened 3 years ago
Hi @chenxyyy , I guess your PyTorch version is too new, so that the add_
function throws this error. Your modification should be correct (pt1.1 https://pytorch.org/docs/1.1.0/torch.html#torch.add v.s. pt1.5 https://pytorch.org/docs/1.5.0/torch.html#torch.add).
I wrote in README that batch_size=2 (labeled+unlabeled per gpu) is currently hardcoded. This is inconvenient, but I'm sorry that the solution is to change the hardcode or to improve the code yourself. I may improve this myself in the future.
Please search for batch_size = 2
in the code and change 2 to your batch size (labeled+unlabeled per gpu).
Hi, I found another error.
# /data/3DIoUMatch-PVRCNN/pcdet/datasets/kitti/kitti_dataset.py line:46
if self.training:
all_train = len(self.kitti_infos)
self.unlabeled_index_list = list(set(list(range(all_train))) - set(self.sample_index_list)) # float()!!!
# print(self.unlabeled_index_list)
self.unlabeled_kitti_infos = []
the type of the set : (set(list(range(all_train)))
is int
, but the type of the set : set(self.sample_index_list))
is str
, so the code self.unlabeled_index_list = list(set(list(range(all_train))) - set(self.sample_index_list)) # float()!!!
didn't work at all.
I changed it to self.unlabeled_index_list = list(set(list(range(all_train))) - set([int(i) for i in self.sample_index_list]))
I don't know if I understand right.
Oh yes, your understanding is correct. In fact this reminded me that I noticed this before but forgot to add a comment to this. Adding the labeled set to the unlabeled set is OK, and it makes no difference to the performance. You can leave this alone or apply your modification.
Hi @THU17cyz , I made the following changes and finally ran the training code successfully.
pcdet/models/detectors/pv_rcnn_ssl.py line:38
modify the batch_dict['mask']
, add batch_dict['mask'] = batch_dict['mask'][:, :1]
under if self.training:
def forward(self, batch_dict):
if self.training:
batch_dict['mask'] = batch_dict['mask'][:, :1] # modify by chenxyyyy
mask = batch_dict['mask'].view(-1)
pcdet/models/detectors/pv_rcnn_ssl.py line:155
due to the existence of pseudo_ sem_score
is empty, but unzero_ Inds
is not empty, so the index will be out of bounds, I added the judgment condition after line:155
for i, ind in enumerate(unlabeled_mask):
# statistics
anchor_by_gt_overlap = iou3d_nms_utils.boxes_iou3d_gpu(
batch_dict['gt_boxes'][ind, ...][:, 0:7],
ori_unlabeled_boxes[i, :, 0:7])
cls_pseudo = batch_dict['gt_boxes'][ind, ...][:, 7]
unzero_inds = torch.nonzero(cls_pseudo).squeeze(1).long()
cls_pseudo = cls_pseudo[unzero_inds]
if len(unzero_inds) > 0 and len(pseudo_sem_score) > len(unzero_inds): # modify by chenxyyyy
iou_max, asgn = anchor_by_gt_overlap[unzero_inds, :].max(dim=1)
Batch_size
you set the batchsize to 2 in
pcdet/models/dense_heads/anchor_head_template.py : line :104
pcdet/models/dense_heads/anchor_head_template.py : line :178
pcdet/models/dense_heads/point_head_template.py line:143
pcdet/models/roi_heads/roi_head_template.py line : 234 and 246
when I debuge the project I found that the batchsize defined in train.py
will load double data (labed and unlabed),if the batchsize defined in train.py
is 2,the batchsize when calculate the loss should be 4.
So I modified the batchsize in anchor_head_template.py, point_head_template.py, roi_head_template.py
double.
And then running the training program is normal.
I want to know if my change is correct, and how to solve the error when the batch size is large?
Looking forward to your reply
Why add this?
batch_dict['mask'] = batch_dict['mask'][:, :1]
Also I forgot, if you want to have labeled_data_batch_size != unlabeled_data_batch_size, https://github.com/THU17cyz/3DIoUMatch-PVRCNN/blob/1aa469fb7b0bdc22fc030f660f741e59a666160c/pcdet/datasets/kitti/kitti_dataset_ssl.py#L398 this should be modified (also hardcoded that labeled and unlabeled data batch size are the same).
And what happened if you did not make the second modification? I have not met with such situation.
when I use batchsize 2:
if not add batch_dict['mask'] = batch_dict['mask'][:, :1]
batch_dict['mask'] is [[1, 1], [0, 0], [1, 1], [0, 0]]
def forward(self, batch_dict):
if self.training:
### batch_dict['mask'] is [[1, 1], [0, 0], [1, 1], [0, 0]]
mask = batch_dict['mask'].view(-1)
labeled_mask = torch.nonzero(mask).squeeze(1).long() # the labeled_mask will be [2, 3, 6, 7]
unlabeled_mask = torch.nonzero(1-mask).squeeze(1).long() # the unlabeled_mask will be [0, 1, 4, 5]
the unlabeled_mask will be [0, 1, 4, 5], when run into the following:
for ind in unlabeled_mask:
pseudo_score = pred_dicts[ind]['pred_scores']
pseudo_box = pred_dicts[ind]['pred_boxes']
pseudo_label = pred_dicts[ind]['pred_labels']
pseudo_sem_score = self.new_method(pred_dicts, ind)
...
the pred_dicts.shape[0]
is 4, so the index: 4, 5
is out of the range of pred_dicts
.
About the second modification, I have found something else.
I noticed that after this code was executed, the variable pseudo_sem_score
is the last element of pred_dicts
for ind in unlabeled_mask:
pseudo_score = pred_dicts[ind]['pred_scores']
pseudo_box = pred_dicts[ind]['pred_boxes']
pseudo_label = pred_dicts[ind]['pred_labels']
pseudo_sem_score = pred_dicts[ind]['pred_sem_scores']
if len(pseudo_label) == 0:
pseudo_boxes.append(pseudo_label.new_zeros((0, 8)).float())
continue
conf_thresh = torch.tensor(self.thresh, device=pseudo_label.device).unsqueeze(
0).repeat(len(pseudo_label), 1).gather(dim=1, index=(pseudo_label-1).unsqueeze(-1))
valid_inds = pseudo_score > conf_thresh.squeeze()
valid_inds = valid_inds * (pseudo_sem_score > self.sem_thresh[0])
pseudo_sem_score = pseudo_sem_score[valid_inds]
pseudo_box = pseudo_box[valid_inds]
pseudo_label = pseudo_label[valid_inds]
# if len(valid_inds) > max_box_num:
# _, inds = torch.sort(pseudo_score, descending=True)
# inds = inds[:max_box_num]
# pseudo_box = pseudo_box[inds]
# pseudo_label = pseudo_label[inds]
pseudo_boxes.append(torch.cat([pseudo_box, pseudo_label.view(-1, 1).float()], dim=1))
if pseudo_box.shape[0] > max_pseudo_box_num:
max_pseudo_box_num = pseudo_box.shape[0]
# pseudo_scores.append(pseudo_score)
# pseudo_labels.append(pseudo_label)
So when executing the following code, pseudo_sem_score
will always the one.
for i, ind in enumerate(unlabeled_mask):
# statistics
anchor_by_gt_overlap = iou3d_nms_utils.boxes_iou3d_gpu(
batch_dict['gt_boxes'][ind, ...][:, 0:7],
ori_unlabeled_boxes[i, :, 0:7])
cls_pseudo = batch_dict['gt_boxes'][ind, ...][:, 7]
unzero_inds = torch.nonzero(cls_pseudo).squeeze(1).long()
cls_pseudo = cls_pseudo[unzero_inds]
if len(unzero_inds) > 0:
iou_max, asgn = anchor_by_gt_overlap[unzero_inds, :].max(dim=1)
pseudo_ious.append(iou_max.unsqueeze(0))
acc = (ori_unlabeled_boxes[i][:, 7].gather(dim=0, index=asgn) == cls_pseudo).float().mean()
pseudo_accs.append(acc.unsqueeze(0))
fg = (iou_max > 0.5).float().sum(dim=0, keepdim=True) / len(unzero_inds)
sem_score_fg = (pseudo_sem_score[unzero_inds] * (iou_max > 0.5).float()).sum(dim=0, keepdim=True) \
/ torch.clamp((iou_max > 0.5).float().sum(dim=0, keepdim=True), min=1.0)
sem_score_bg = (pseudo_sem_score[unzero_inds] * (iou_max < 0.5).float()).sum(dim=0, keepdim=True) \
/ torch.clamp((iou_max < 0.5).float().sum(dim=0, keepdim=True), min=1.0)
So I opened your code comments,
pseudo_scores.append(pseudo_score)
pseudo_labels.append(pseudo_label)
and add pseudo_sem_score = pseudo_sem_scores[i]
before your code : sem_score_fg = (pseudo_sem_score[unzero_inds] * (iou_max > 0.5).float()).sum(dim=0, keepdim=True) \
About the second modification, I have found something else.
I noticed that after this code was executed, the variable
pseudo_sem_score
is the last element ofpred_dicts
for ind in unlabeled_mask: pseudo_score = pred_dicts[ind]['pred_scores'] pseudo_box = pred_dicts[ind]['pred_boxes'] pseudo_label = pred_dicts[ind]['pred_labels'] pseudo_sem_score = pred_dicts[ind]['pred_sem_scores'] if len(pseudo_label) == 0: pseudo_boxes.append(pseudo_label.new_zeros((0, 8)).float()) continue conf_thresh = torch.tensor(self.thresh, device=pseudo_label.device).unsqueeze( 0).repeat(len(pseudo_label), 1).gather(dim=1, index=(pseudo_label-1).unsqueeze(-1)) valid_inds = pseudo_score > conf_thresh.squeeze() valid_inds = valid_inds * (pseudo_sem_score > self.sem_thresh[0]) pseudo_sem_score = pseudo_sem_score[valid_inds] pseudo_box = pseudo_box[valid_inds] pseudo_label = pseudo_label[valid_inds] # if len(valid_inds) > max_box_num: # _, inds = torch.sort(pseudo_score, descending=True) # inds = inds[:max_box_num] # pseudo_box = pseudo_box[inds] # pseudo_label = pseudo_label[inds] pseudo_boxes.append(torch.cat([pseudo_box, pseudo_label.view(-1, 1).float()], dim=1)) if pseudo_box.shape[0] > max_pseudo_box_num: max_pseudo_box_num = pseudo_box.shape[0] # pseudo_scores.append(pseudo_score) # pseudo_labels.append(pseudo_label)
So when executing the following code,
pseudo_sem_score
will always the one.for i, ind in enumerate(unlabeled_mask): # statistics anchor_by_gt_overlap = iou3d_nms_utils.boxes_iou3d_gpu( batch_dict['gt_boxes'][ind, ...][:, 0:7], ori_unlabeled_boxes[i, :, 0:7]) cls_pseudo = batch_dict['gt_boxes'][ind, ...][:, 7] unzero_inds = torch.nonzero(cls_pseudo).squeeze(1).long() cls_pseudo = cls_pseudo[unzero_inds] if len(unzero_inds) > 0: iou_max, asgn = anchor_by_gt_overlap[unzero_inds, :].max(dim=1) pseudo_ious.append(iou_max.unsqueeze(0)) acc = (ori_unlabeled_boxes[i][:, 7].gather(dim=0, index=asgn) == cls_pseudo).float().mean() pseudo_accs.append(acc.unsqueeze(0)) fg = (iou_max > 0.5).float().sum(dim=0, keepdim=True) / len(unzero_inds) sem_score_fg = (pseudo_sem_score[unzero_inds] * (iou_max > 0.5).float()).sum(dim=0, keepdim=True) \ / torch.clamp((iou_max > 0.5).float().sum(dim=0, keepdim=True), min=1.0) sem_score_bg = (pseudo_sem_score[unzero_inds] * (iou_max < 0.5).float()).sum(dim=0, keepdim=True) \ / torch.clamp((iou_max < 0.5).float().sum(dim=0, keepdim=True), min=1.0)
So I opened your code comments,
pseudo_scores.append(pseudo_score) pseudo_labels.append(pseudo_label)
and add
pseudo_sem_score = pseudo_sem_scores[i]
before your code :sem_score_fg = (pseudo_sem_score[unzero_inds] * (iou_max > 0.5).float()).sum(dim=0, keepdim=True) \
Yes, if unlabeled batch size > 1, your modification is necessary.
when I use batchsize 2:
if not add
batch_dict['mask'] = batch_dict['mask'][:, :1]
batch_dict['mask'] is [[1, 1], [0, 0], [1, 1], [0, 0]]def forward(self, batch_dict): if self.training: ### batch_dict['mask'] is [[1, 1], [0, 0], [1, 1], [0, 0]] mask = batch_dict['mask'].view(-1) labeled_mask = torch.nonzero(mask).squeeze(1).long() # the labeled_mask will be [2, 3, 6, 7] unlabeled_mask = torch.nonzero(1-mask).squeeze(1).long() # the unlabeled_mask will be [0, 1, 4, 5]
the unlabeled_mask will be [0, 1, 4, 5], when run into the following:
for ind in unlabeled_mask: pseudo_score = pred_dicts[ind]['pred_scores'] pseudo_box = pred_dicts[ind]['pred_boxes'] pseudo_label = pred_dicts[ind]['pred_labels'] pseudo_sem_score = self.new_method(pred_dicts, ind) ...
the
pred_dicts.shape[0]
is 4, so theindex: 4, 5
is out of the range ofpred_dicts
.
I see. I think you're right. You can also modify the collate_batch
function here: https://github.com/THU17cyz/3DIoUMatch-PVRCNN/blob/1aa469fb7b0bdc22fc030f660f741e59a666160c/pcdet/datasets/kitti/kitti_dataset_ssl.py#L405.
I'll soon update the codebase to support arbitrary batch_size. Thank you very much for pointing out these!
hello, @chenxyyy when i want to pretrain phase on KITTI, i meet a problem : scripts/slurm_pretrain.sh: line 26: srun: command not found. Could you give me some advice?
Looking forward to your reply
hello, @chenxyyy when i want to pretrain phase on KITTI, i meet a problem : scripts/slurm_pretrain.sh: line 26: srun: command not found. Could you give me some advice?
Looking forward to your reply
Are you sure you are running this script on a machine with slurm environment? For example, if you are using GCP/AWS machines, or clusters not installed with slurm, this script won't work.
Hello! @THU17cyz Thank you for open-sourcing your codebase.
I have successfully run your pretrain phase on KITTI. But I had a problem running train phase.
when I use batchsize 1, I met a problem on the below code
It notices that tha add_ cannot take 2 params,So I chage the code
ema_param.data.mul_(alpha).add_(1 - alpha, param.data)
toema_param.data.mul_(alpha).add_((1 - alpha) * param.data)
.Then I run sucessful
But when I change the batchsize to 2、4 or others,I met the index error on below.
It shows that the num in
unlabeled_mask
beyond the range ofpred_dicts
. for example, the unlabeled_mask is [2,3,6,7] but the pred_dicts size is 4, 6,7 is illegal.I want to know if my change is correct, and how to solve the error when the batch size is large?
Looking forward to your reply