Open zwyzwy opened 5 years ago
This problem I have no idea, I will try to create a docker for this project to provide a reproducible environment for errors. Multi GPU: currently not supported. The major reason is I only have one GPU. If you want to use multi gpu training, you need to pad the input (or just not slice array in point_to_voxel), then slice points inside module.
what do you mean in "slice array in point_to_voxel" and "slice points inside module"? as the code shows that you put all the points in one single batch together, how can I recognize how many points in a sample and others ?
The number of voxels converted from points is not fixed, you can see a slice operation in point_to_voxel
. For multi-gpu, you need to return voxel_num
in point_to_voxel
, use fixed-size input before nn.DataParallel, passvoxel_num
as a Tensor and gather all valid voxels inside nn.Module in nn.DataParallel.
@zwyzwy have you any solution?
when I training the model, the middle evaluation occurred the error below:
Traceback (most recent call last): File "vox_gluon/train_gluon.py", line 759, in
fire.Fire()
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, *kwargs)
File "vox_gluon/train_gluon.py", line 504, in train
raise e
File "vox_gluon/train_gluon.py", line 486, in train
result = get_official_eval_result(gt_annos[:len(gt_annos)-1], dt_annos, class_names)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/utils/eval.py", line 824, in get_official_eval_result
mAPbbox, mAPbev, mAP3d, mAPaos = do_eval_v2(gt_annos, dt_annos, current_classes, min_overlaps, compute_aos, difficultys)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/utils/eval.py", line 701, in do_eval_v2
ret = eval_class_v3(gt_annos, dt_annos, current_classes, difficultys, 1, min_overlaps)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/utils/eval.py", line 574, in eval_class_v3
rets = calculate_iou_partly(dt_annos, gt_annos, metric, num_parts)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/utils/eval.py", line 384, in calculate_iou_partly
overlap_part = bev_box_overlap(gt_boxes, dt_boxes).astype(np.float64)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/utils/eval.py", line 126, in bev_box_overlap
riou = rotate_iou_gpu_eval(boxes, qboxes, criterion)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/core/non_max_suppression/nms_gpu.py", line 652, in rotate_iou_gpu_eval
N, K, boxes_dev, query_boxes_dev, iou_dev, criterion)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/numba/cuda/compiler.py", line 484, in call
sharedmem=self.sharedmem)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/numba/cuda/compiler.py", line 558, in _kernel_call
cu_func(kernelargs)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 1301, in call
self.sharedmem, streamhandle, args)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 1345, in launch_kernel
None)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 288, in safe_cuda_api_call
self._check_error(fname, retcode)
File "/home/users/wenyong.zheng/anaconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 323, in _check_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [400] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_HANDLE
By the way, have you done the training by multi GPUs ?