issues about create_data

sunnyHelen commented 2 years ago

Hi, thanks for sharing your great work. I encounter some issues during creating data by running create_data.py First create reduced point cloud for training set [ ] 0/3712, elapsed: 0s, ETA:Traceback (most recent call last): File "tools/create_data.py", line 247, in
out_dir=args.out_dir)
File "tools/create_data.py", line 24, in kitti_data_prep
kitti.create_reduced_point_cloud(root_path, info_prefix)
File "/mnt/lustre/chenzhuo1/hzha/SimIPU/tools/data_converter/kitti_converter.py", line 374, in create_reduced_point_cloud
_create_reduced_point_cloud(data_path, train_info_path, save_path)
File "/mnt/lustre/chenzhuo1/hzha/SimIPU/tools/data_converter/kitti_converter.py", line 314, in _create_reduced_point_cloud
count=-1).reshape([-1, num_features])
ValueError: cannot reshape array of size 461536 into shape (6)

It seems to set the num_features=4 and front_camera_id=2? in this line: https://github.com/zhyever/SimIPU/blob/5b346e392c161a5e9fdde09b1692656bc7cd3faf/tools/data_converter/kitti_converter.py#L291

I assume doing this can solve the problem but encounter another problem when Create GT Database of KittiDataset
[ ] 0/3712, elapsed: 0s, ETA:Traceback (most recent call last):
File "tools/create_data.py", line 247, in
out_dir=args.out_dir)
File "tools/create_data.py", line 44, in kitti_data_prep
with_bbox=True) # for moca
File "/mnt/lustre/chenzhuo1/hzha/SimIPU/tools/data_converter/create_gt_database.py", line 275, in create_groundtruth_database
P0 = np.array(example['P0']).reshape(4, 4)
KeyError: 'P0'

Can you help me figure out how to solve these issues?

zhyever commented 2 years ago

You should set front_camera_id as 0 for KITTI. https://github.com/zhyever/SimIPU/blob/5b346e392c161a5e9fdde09b1692656bc7cd3faf/tools/data_converter/kitti_converter.py#L292

:D Since the released codes are only supporting pre-training on KITTI, data preparation is similar to standard mmdet3d. So, you can utilize the standard mmdet3d (correct version introduced in README.md) to run create_data.py and then link the prepared data to the simipu repo.

sunnyHelen commented 2 years ago

Thank you for your quick reply. when I create GT Database of KittiDataset [ ] 0/3712, elapsed: 0s, ETA:Traceback (most recent call last): File "tools/create_data.py", line 247, in out_dir=args.out_dir) File "tools/create_data.py", line 44, in kitti_data_prep with_bbox=True) # for moca File "/mnt/lustre/chenzhuo1/hzha/SimIPU/tools/data_converter/create_gt_database.py", line 275, in create_groundtruth_database P0 = np.array(example['P0']).reshape(4, 4) KeyError: 'P0' https://github.com/zhyever/SimIPU/blob/5b346e392c161a5e9fdde09b1692656bc7cd3faf/tools/data_converter/create_gt_database.py#L275 It seems no P0 key. And there are some different places compared with the mmdet3d one. How should I properly creat the data?

zhyever commented 2 years ago

Sorry that I missed your problems since I was busy recently. There is a problem with my last answer. You should set front_camera_id=2.

Actually, I recommend that you clone the mmdet3d and utilize the official codes to generate the KITTI dataset. You can directly link the mmdet3d-generated KITTI to the SimIPU repo.

sunnyHelen commented 2 years ago

Got it. Thanks for your reply.

sunnyHelen commented 2 years ago

But I encounter a problem when I attempt to conduct Camera-lidar fusion-based 3D object detection on kitti dataset. I follow your instruction to do that: bash tools/dist_train.sh project_cl/configs/kitti_det3d/moca_r50_kitti.py 8 --work-dir work_dir/

But there is a problem when loading data. Does it seem related to the data label? Could please help me?

Original Traceback (most recent call last): File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/mnt/lustre/chen/hzha/mmdetection/mmdet/datasets/dataset_wrappers.py", line 151, in getitem return self.dataset[idx % self._ori_len] File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/datasets/custom_3d.py", line 387, in getitem data = self.prepare_train_data(idx) File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/datasets/kitti_dataset.py", line 122, in prepare_train_data example = self.pipeline(input_dict) File "/mnt/lustre/chen/hzha/mmdetection/mmdet/datasets/pipelines/compose.py", line 40, in call data = t(data) File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/datasets/pipelines/transforms_3d.py", line 185, in call img=img) File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/datasets/pipelines/dbsampler.py", line 388, in sample_all avoid_coll_boxes_2d) File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/datasets/pipelines/dbsampler.py", line 546, in sample_class_v2 sp_boxes_2d = np.stack([i['box2d_camera'] for i in sampled], File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/datasets/pipelines/dbsampler.py", line 546, in sp_boxes_2d = np.stack([i['box2d_camera'] for i in sampled], KeyError: 'box2d_camera'

zhyever commented 2 years ago

Oh, this issue is caused by the key of box2d_camera in dp_sampler. In 'tools/create_data.py', you can find the calling of create_groundtruth_database, which is used to generate the sampled objects for data augment. Since we choose the moca as our baseline method, there are tons of modifications to this ground_database generation function.

Hence, if you create the Kitti dataset via the official mmdet3d codebase, I think you should run the create_groundtruth_database function (comment other lines of code in the kitti_data_prep function) in SimIPU (or Moca) to create the sampled object dataset. If you have created the sampled object dataset via our codes, but there are still these bugs, please report to me and I will have a check. I run the codes before I push this repo to github, so there should have been OK.

sunnyHelen commented 2 years ago

Thanks a lot. I used the official mmdet3d to create the data label before. I'll follow your instruction to run the create_groundtruth_database function.

sunnyHelen commented 2 years ago

Hi. I tried to run the create_groundtruth_database function. But it seems we go back to the previous problem:

[ ] 0/3712, elapsed: 0s, ETA:Traceback (most recent call last): File "tools/create_data.py", line 247, in out_dir=args.out_dir) File "tools/create_data.py", line 44, in kitti_data_prep with_bbox=True) # for moca File "/mnt/lustre/chenzhuo1/hzha/SimIPU/tools/data_converter/create_gt_database.py", line 275, in create_groundtruth_database P0 = np.array(example['P0']).reshape(4, 4) KeyError: 'P0'

zhyever commented 2 years ago

Let me explain why there are problems. We first conduct experiments on KITTI dataset, where the used images come from the second camera. So, when creating the KITTI, all PX should be P2 (utilize the camera parameters from the second camera). Later, we try to do experiments on Waymo, where the utilized images are in the front view, having a number of 0. Hence, we hack the codes to generate related data with P0.

However, when I push the codes that only support KITTI, I forget to change the data-related codes to the KITTI version. So, you meet problems about KeyError: 'P0'. For KITTI, just utilize P2. :D

sunnyHelen commented 2 years ago

Hi, thanks for your help. I successfully created the label after changing P0-->P2. But the error still exists when: bash tools/dist_train.sh project_cl/configs/kitti_det3d/moca_r50_kitti.py 8 --work-dir work_dir/

Original Traceback (most recent call last): File "/mnt/cache/chenzhuo1/anaconda3/envs/simipu/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/mnt/cache/chenzhuo1/anaconda3/envs/simipu/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/mnt/cache/chenzhuo1/anaconda3/envs/simipu/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/mnt/lustre/chenzhuo1/hzha/mmdetection/mmdet/datasets/dataset_wrappers.py", line 151, in getitem return self.dataset[idx % self._ori_len] File "/mnt/lustre/chenzhuo1/hzha/SimIPU/mmdet3d/datasets/custom_3d.py", line 387, in getitem data = self.prepare_train_data(idx) File "/mnt/lustre/chenzhuo1/hzha/SimIPU/mmdet3d/datasets/kitti_dataset.py", line 122, in prepare_train_data example = self.pipeline(input_dict) File "/mnt/lustre/chenzhuo1/hzha/mmdetection/mmdet/datasets/pipelines/compose.py", line 40, in call data = t(data) File "/mnt/lustre/chenzhuo1/hzha/SimIPU/mmdet3d/datasets/pipelines/transforms_3d.py", line 185, in call img=img) File "/mnt/lustre/chenzhuo1/hzha/SimIPU/mmdet3d/datasets/pipelines/dbsampler.py", line 388, in sample_all avoid_coll_boxes_2d) File "/mnt/lustre/chenzhuo1/hzha/SimIPU/mmdet3d/datasets/pipelines/dbsampler.py", line 546, in sample_class_v2 sp_boxes_2d = np.stack([i['box2d_camera'] for i in sampled], File "/mnt/lustre/chenzhuo1/hzha/SimIPU/mmdet3d/datasets/pipelines/dbsampler.py", line 546, in sp_boxes_2d = np.stack([i['box2d_camera'] for i in sampled], KeyError: 'box2d_camera'

zhyever commented 2 years ago

I will have a check from scratch ASAP and update this repo. Btw, that's the problem only for the Moca training (our downstream task on 3D detection). While the gt_sampler does not work, you can still run the SimIPU since our pre-training method does not need any gt information.

sunnyHelen commented 2 years ago

Yeah, I've tried the pretraining code, which is totally ok. Thanks for your help.

bhavyagoyal commented 2 years ago

Hi @zhyever, I am running into the same error (KeyError: 'box2d_camera') for the downstream evaluation on Kitti dataset. Pretraining step does not have any issue. Let me know if there is an update. Thanks for the help!

sunnyHelen commented 2 years ago

Hi, is there any new thing about solving the problem?

zhyever commented 2 years ago

Sorry for the late.

Download the pkl and the zipped gt_database.

Rename the pkl file to kitti_dbinfos_train.pkl and put it under your data folder. Unzip the .zip file, rename the folder to kitti_gt_database, and put it under your data folder.

The result can be like this:

Then, run the training script again.

sunnyHelen commented 2 years ago

Thanks a lot for your apply. It seems the data problem is solved. But there are still some problems while training.

Traceback (most recent call last): File "tools/train.py", line 222, in main() File "tools/train.py", line 218, in main meta=meta) File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/apis/train.py", line 34, in train_model meta=meta) File "/mnt/lustre/chen/hzha/mmdetection/mmdet/apis/train.py", line 170, in train_detector meta=meta) File "/mnt/lustre/chen/hzha/mmdetection/mmdet/apis/train.py", line 170, in train_detector runner.run(data_loaders, cfg.workflow) File "/mnt/cache/chenzhuo1/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run runner.run(data_loaders, cfg.workflow) File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], kwargs) File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train epoch_runner(data_loaders[i], kwargs) File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, kwargs) File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter self.run_iter(data_batch, train_mode=True, kwargs) File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter kwargs) File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 42, in train_step kwargs) File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 42, in train_step and self.reducer._rebuild_buckets()): RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel, and by making sure all forward function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable). Parameter indices which did not receive grad for rank 0: 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 ... In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this ran k as part of this error

sunnyHelen commented 2 years ago

I tried to pass the keyword argument find_unused_parameters=True to `torch.nn.parallel.DistributedDataParallel. But it doesn't work.

zhyever commented 2 years ago

Set this flag in your config file instead of passing it by the shell.

You can add a line of find_unused_parameters=True in your config file.

sunnyHelen commented 2 years ago

Yes. It works! Many thanks for your help.

bhavyagoyal commented 2 years ago

Thanks @zhyever. The funetuning on kitti3d detection is resolved now. But there seems to be an error during the evaluation (after 30 epochs). Here is the log for the error.

  File "tools/train.py", line 222, in <module>
    main()
  File "tools/train.py", line 218, in main
    meta=meta)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/apis/train.py", line 34, in train_model
    meta=meta)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdetection/mmdet/apis/train.py", line 170, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 54, in train
    self.call_hook('after_train_epoch')
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdetection/mmdet/core/evaluation/eval_hooks.py", line 279, in after_train_epoch
    key_score = self.evaluate(runner, results)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdetection/mmdet/core/evaluation/eval_hooks.py", line 177, in evaluate
    results, logger=runner.logger, **self.eval_kwargs)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/datasets/kitti_dataset.py", line 412, in evaluate
    eval_types=eval_types)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/core/evaluation/kitti_utils/eval.py", line 709, in kitti_eval
    eval_types)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/core/evaluation/kitti_utils/eval.py", line 613, in do_eval
    min_overlaps)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/core/evaluation/kitti_utils/eval.py", line 479, in eval_class
    rets = calculate_iou_partly(dt_annos, gt_annos, metric, num_parts)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/core/evaluation/kitti_utils/eval.py", line 382, in calculate_iou_partly
    dt_boxes).astype(np.float64)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/core/evaluation/kitti_utils/eval.py", line 116, in bev_box_overlap
    from .rotate_iou import rotate_iou_gpu_eval
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/core/evaluation/kitti_utils/rotate_iou.py", line 292, in <module>
    criterion=-1):
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/decorators.py", line 101, in kernel_jit
    kernel.bind()
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/compiler.py", line 548, in bind
    self._func.get()
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/compiler.py", line 426, in get
    ptx = self.ptx.get()
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/compiler.py", line 397, in get
    **self._extra_options)
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/cudadrv/nvvm.py", line 496, in llvm_to_ptx
    ptx = cu.compile(**opts)
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/cudadrv/nvvm.py", line 233, in compile
    self._try_error(err, 'Failed to compile\n')
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/cudadrv/nvvm.py", line 251, in _try_error
    self.driver.check_error(err, "%s\n%s" % (msg, self.get_log()))
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/cudadrv/nvvm.py", line 141, in check_error
    raise exc
numba.cuda.cudadrv.error.NvvmError: Failed to compile

<unnamed> (66, 23): parse expected comma after load's type
NVVM_ERROR_COMPILATION

zhyever commented 2 years ago

That's something related to the build of mmdet3d (in this repo, SimIPU). Refer to Issue for more information.

zhyever / SimIPU

issues about create_data #5