Open AitorIglesias opened 1 year ago
Some tips:
samples_per_gpu
you really used in the printed config in training.PYTHONPATH
is the path of your modified project.Thanks for the tips @JingweiZhang12 ,
samples_per_gpu=2
.PYTHONPATH
is the one of my project.Even so the error is still occurring
The number of proposals fed into the roi_head is equal to 1. Weird! It's too small. You can check it.
@JingweiZhang12 I have been debuging a bit, and actually the number of proposals fed into the roi_head is equal to 2, since it is the same length as the batch size.
I can increase the batch size to 3, but no more, even so the error is still ocurring. There might be another way to fix the error.
Prerequisite
Task
I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.
Branch
master branch https://github.com/open-mmlab/mmdetection3d
Environment
sys.platform: linux Python: 3.7.7 (default, May 7 2020, 21:25:33) [GCC 7.3.0] CUDA available: True GPU 0,1,2,3: Tesla T4 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 10.1, V10.1.24 GCC: gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 PyTorch: 1.6.0 PyTorch compiling details: PyTorch built with:
TorchVision: 0.7.0 OpenCV: 4.6.0 MMCV: 1.6.2 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 10.1 MMDetection: 2.25.2 MMSegmentation: 0.29.0 MMDetection3D: 1.0.0rc5+8ea1752 spconv2.0: False
Reproduces the problem - code sample
cofings/base/models/parta2-nus.py
cofings/parta2/hv_PartA2_secfpn_2x8_cyclic_80e_nus3d.py
Reproduces the problem - command or script
Reproduces the problem - error message
File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/user/.vscode-server/extensions/ms-python.python-2022.18.2/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 39, in
cli.main()
File "/home/user/.vscode-server/extensions/ms-python.python-2022.18.2/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "/home/user/.vscode-server/extensions/ms-python.python-2022.18.2/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
runpy.run_path(target, run_name="main")
File "/home/user/.vscode-server/extensions/ms-python.python-2022.18.2/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 322, in run_path
pkg_name=pkg_name, script_name=fname)
File "/home/user/.vscode-server/extensions/ms-python.python-2022.18.2/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 136, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/home/user/.vscode-server/extensions/ms-python.python-2022.18.2/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals)
File "tools/train.py", line 263, in
main()
File "tools/train.py", line 259, in main
meta=meta)
File "/mmdetection3d/mmdet3d/apis/train.py", line 351, in train_model
meta=meta)
File "/mmdetection3d/mmdet3d/apis/train.py", line 319, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run
epoch_runner(data_loaders[i], kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 53, in train
self.run_iter(data_batch, train_mode=True, kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 32, in run_iter
kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 77, in train_step
return self.module.train_step(inputs[0], kwargs[0])
File "/opt/conda/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 248, in train_step
losses = self(data)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(input, kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 116, in new_func
return old_func(args, kwargs)
File "/mmdetection3d/mmdet3d/models/detectors/base.py", line 60, in forward
return self.forward_train(kwargs)
File "/mmdetection3d/mmdet3d/models/detectors/parta2.py", line 132, in forward_train
gt_bboxes_3d, gt_labels_3d)
File "/mmdetection3d/mmdet3d/models/roi_heads/part_aggregation_roi_head.py", line 125, in forward_train
voxels_dict, sample_results)
File "/mmdetection3d/mmdet3d/models/roi_heads/part_aggregation_roi_head.py", line 189, in _bbox_forward_train
rois)
File "/mmdetection3d/mmdet3d/models/roi_heads/part_aggregation_roi_head.py", line 222, in _bbox_forward
pooled_part_feats)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(input, kwargs)
File "/mmdetection3d/mmdet3d/models/roi_heads/bbox_heads/parta2_bbox_head.py", line 270, in forward
x_part = self.part_conv(part_features)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, *kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/ops/sparse_modules.py", line 135, in forward
input = module(input)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(input, kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/ops/sparse_modules.py", line 139, in forward
input.features = module(input.features)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 136, in forward
self.weight, self.bias, bn_training, exponential_average_factor, self.eps)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 2012, in batch_norm
_verify_batch_size(input.size())
File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 1995, in _verify_batch_size
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 64])
Additional information
I am trying to train PartA2 with the nuscenes dataset (the mini version), for this I have modified the files cofings/base/models/parta2.py and cofings/parta2/hv_PartA2_secfpn_2x8_cyclic_80e_kitti-3d-3class.py. I have saved these in cofings/base/models/parta2-nus.py and cofings/parta2/hv_PartA2_secfpn_2x8_cyclic_80e_nus3d.py. These files are the ones specified above.
When I try to train the model I get the error specified above, I have been looking for a solution but I only find that it is because the batch size is equal to 1 and that is not my case since I have batch size equal to 2.
I would like to know why the error occurs and how to solve it in order to train Part-A2 with the nuscene dataset.