thangvubk / SoftGroup

[CVPR 2022 Oral] SoftGroup for Instance Segmentation on 3D Point Clouds
MIT License
359 stars 81 forks source link

Is there any problem when using train_loader of scannetV2? #4

Closed Yustarzzz closed 2 years ago

Yustarzzz commented 2 years ago

Hi! I'm working with your code to train scannetV2 for 3d object detection. Anyway, I want to test this code for a few dataset. ( ex. 2 train scene, 1 val scene, 1 test scene for scannetV2 dataset)

So, I modify train.txt, val.txt, test.txt for these 4 scene. And my dataset structure is like below. image

Fortunately, when I try to run train.py, It works.

However, there is an error like this. In this error, it said that there is no 'loss' key in am_dict. Therefore, I printed it and it shows that am_dict is an empty list.

Also, I tried to print train_loader of this dataset, by adding a code like "for i, batch in enumerate(train_loader): print (i)". However, there isn't any result of this print statement. (It means, that there is no train_loader?) So, I want to ask some help! Thk 👍


/content/drive/MyDrive/softgroup/SoftGroup/util/config.py:22: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. config = yaml.load(f) [2022-03-21 15:47:04,679 INFO log.py line 39 7139] **** Start Logging **** [2022-03-21 15:47:04,751 INFO train.py line 22 7139] Namespace(TEST_NMS_THRESH=0.3, TEST_NPOINT_THRESH=100, TEST_SCORE_THRESH=-1, batch_size=4, bg_thresh=0.0, block_reps=2, block_residual=True, class_numpoint_mean=[-1.0, -1.0, 3917.0, 12056.0, 2303.0, 8331.0, 3948.0, 3166.0, 5629.0, 11719.0, 1003.0, 3317.0, 4912.0, 10221.0, 3889.0, 4136.0, 2120.0, 945.0, 3967.0, 2589.0], classes=18, cluster_shift_meanActive=300, config='config/softgroup_default_scannet.yaml', data_root='dataset', dataset='scannetv2', dataset_dir='data/scannetv2_inst.py', dist=False, epochs=500, eval=True, exp_path='exp/scannetv2/softgroup/softgroup_default_scannet', fg_thresh=1.0, filename_suffix='_inst_nostuff.pth', fix_module=['input_conv', 'unet', 'output_layer', 'semantic_linear', 'offset_linear'], full_scale=[128, 512], ignore_label=-100, input_channel=3, iou_thr=0.5, local_rank=0, loss_weight=[1.0, 1.0, 1.0, 1.0, 1.0], lr=0.001, manual_seed=123, max_npoint=250000, max_proposal_num=200, mode=4, model_dir='model/softgroup/softgroup.py', model_name='softgroup', momentum=0.9, multiplier=0.5, optim='Adam', point_aggr_radius=0.04, prepare_epochs=-1, pretrain=None, pretrain_module=['input_conv', 'unet', 'output_layer', 'semantic_linear', 'offset_linear', 'intra_ins_unet', 'intra_ins_outputlayer'], pretrain_path='hais_ckpt.pth', save_dir='exp', save_freq=16, save_instance=False, save_pt_offsets=False, save_semantic=False, scale=50, score_fullscale=20, score_mode=4, score_scale=50, score_thr=0.2, semantic_classes=20, semantic_only=False, split='val', step_epoch=200, task='train', test_epoch=500, test_mask_score_thre=-0.5, test_seed=567, test_workers=16, train_workers=4, use_coords=True, using_NMS=False, weight_decay=0.0001, width=32) [2022-03-21 15:47:04,757 INFO train.py line 153 7139] => creating model ... Load pretrained input_conv: 1/1 Load pretrained unet: 390/390 Load pretrained output_layer: 5/5 Load pretrained semantic_linear: 9/9 Load pretrained offset_linear: 9/9 Load pretrained intra_ins_unet: 85/85 Load pretrained intra_ins_outputlayer: 5/5 [2022-03-21 15:47:09,078 INFO train.py line 164 7139] cuda available: True [2022-03-21 15:47:09,130 INFO train.py line 168 7139] #classifier parameters: 30839600 [2022-03-21 15:47:09,311 INFO scannetv2_inst.py line 50 7139] Training samples: 2 [2022-03-21 15:47:09,375 INFO scannetv2_inst.py line 84 7139] Validation samples: 1 Traceback (most recent call last): File "train.py", line 221, in train_epoch(dataset.train_data_loader, model, model_fn, optimizer, epoch) File "train.py", line 98, in train_epoch logger.info("epoch: {}/{}, train loss: {:.4f}, time: {}s".format(epoch, cfg.epochs, am_dict['loss'].avg, time.time() - start_epoch)) KeyError: 'loss'

thangvubk commented 2 years ago

It seems the dataloader cannot load the data. Could you please check whether your data is in correct path. The train, val and test should be in SoftGroup/dataset/scannetv2/

Yustarzzz commented 2 years ago

Hi thangvubk ! Thanks for your help. However, they are in the correct path. . .

thangvubk commented 2 years ago

The problem is your train data has only 2 scans. The default batch size is 4 and drop_last=True, it will ignore your data. See below.

https://github.com/thangvubk/SoftGroup/blob/5ac64851e8f191df33e8b9d821f6a8995bd9a444/data/scannetv2_inst.py#L53-L54

The solution is (1) set drop_last to False, or (2) reduce batch_size to 2.

Yustarzzz commented 2 years ago

Thanks thangvubk! I think maybe it works. I uploaded more 2 scenes for train dataset, so there is 4 scenes.

However, there is an other error occur like below ... Do you know about this?


/content/drive/MyDrive/AIA/softgroup/SoftGroup/util/config.py:22: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. config = yaml.load(f) [2022-03-22 12:37:56,288 INFO log.py line 39 7052] **** Start Logging **** [2022-03-22 12:37:58,464 INFO train.py line 22 7052] Namespace(TEST_NMS_THRESH=0.3, TEST_NPOINT_THRESH=100, TEST_SCORE_THRESH=-1, batch_size=4, bg_thresh=0.0, block_reps=2, block_residual=True, class_numpoint_mean=[-1.0, -1.0, 3917.0, 12056.0, 2303.0, 8331.0, 3948.0, 3166.0, 5629.0, 11719.0, 1003.0, 3317.0, 4912.0, 10221.0, 3889.0, 4136.0, 2120.0, 945.0, 3967.0, 2589.0], classes=18, cluster_shift_meanActive=300, config='config/softgroup_default_scannet.yaml', data_root='dataset', dataset='scannetv2', dataset_dir='data/scannetv2_inst.py', dist=False, epochs=500, eval=True, exp_path='exp/scannetv2/softgroup/softgroup_default_scannet', fg_thresh=1.0, filename_suffix='_inst_nostuff.pth', fix_module=['input_conv', 'unet', 'output_layer', 'semantic_linear', 'offset_linear'], full_scale=[128, 512], ignore_label=-100, input_channel=3, iou_thr=0.5, local_rank=0, loss_weight=[1.0, 1.0, 1.0, 1.0, 1.0], lr=0.001, manual_seed=123, max_npoint=250000, max_proposal_num=200, mode=4, model_dir='model/softgroup/softgroup.py', model_name='softgroup', momentum=0.9, multiplier=0.5, optim='Adam', point_aggr_radius=0.04, prepare_epochs=-1, pretrain=None, pretrain_module=['input_conv', 'unet', 'output_layer', 'semantic_linear', 'offset_linear', 'intra_ins_unet', 'intra_ins_outputlayer'], pretrain_path='hais_ckpt.pth', save_dir='exp', save_freq=16, save_instance=True, save_pt_offsets=False, save_semantic=False, scale=50, score_fullscale=20, score_mode=4, score_scale=50, score_thr=0.2, semantic_classes=20, semantic_only=False, split='val', step_epoch=200, task='train', test_epoch=500, test_mask_score_thre=-0.5, test_seed=567, test_workers=16, train_workers=4, use_coords=True, using_NMS=False, weight_decay=0.0001, width=32) [2022-03-22 12:37:58,473 INFO train.py line 153 7052] => creating model ... Load pretrained input_conv: 1/1 Load pretrained unet: 390/390 Load pretrained output_layer: 5/5 Load pretrained semantic_linear: 9/9 Load pretrained offset_linear: 9/9 Load pretrained intra_ins_unet: 85/85 Load pretrained intra_ins_outputlayer: 5/5 [2022-03-22 12:38:05,721 INFO train.py line 164 7052] cuda available: True [2022-03-22 12:38:05,788 INFO train.py line 168 7052] #classifier parameters: 30839600 [2022-03-22 12:38:06,724 INFO scannetv2_inst.py line 50 7052] Training samples: 4 [2022-03-22 12:38:06,792 INFO scannetv2_inst.py line 84 7052] Validation samples: 1 Traceback (most recent call last): File "train.py", line 221, in train_epoch(dataset.train_data_loader, model, model_fn, optimizer, epoch) File "train.py", line 61, in trainepoch loss, , visual_dict, meter_dict = model_fn(batch, model, epoch, semantic_only=cfg.semantic_only) File "/content/drive/MyDrive/AIA/softgroup/SoftGroup/model/softgroup/softgroup.py", line 527, in modelfn ret = model(input, p2v_map, coords_float, coords[:, 0].int(), batch_offsets, epoch, 'train', semantic_only=semantic_only) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, kwargs) File "/content/drive/MyDrive/AIA/softgroup/SoftGroup/model/softgroup/softgroup.py", line 316, in forward output = self.input_conv(input) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, *kwargs) File "/usr/local/lib/python3.7/dist-packages/spconv/modules.py", line 123, in forward input = module(input) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(input, kwargs) File "/usr/local/lib/python3.7/dist-packages/spconv/conv.py", line 151, in forward self.stride, self.padding, self.dilation, self.output_padding, self.subm, self.transposed, grid=input.grid) File "/usr/local/lib/python3.7/dist-packages/spconv/ops.py", line 89, in get_indice_pairs stride, padding, dilation, out_padding, int(subm), int(transpose)) RuntimeError: /content/drive/MyDrive/AIA/softgroup/SoftGroup/lib/spconv/src/spconv/indice.cu 120 cuda execution failed with error 98

thangvubk commented 2 years ago

What is the GPU model are you using. It is related to spconv. I found a related issue here https://github.com/open-mmlab/OpenPCDet/issues/442