zhulf0804 / GCNet

Leveraging Inlier Correspondences Proportion for Point Cloud Registration. https://arxiv.org/abs/2201.12094.
MIT License
103 stars 12 forks source link

Segmentation fault when evaluate 3DMatch #6

Closed zhirihuixin closed 2 years ago

zhirihuixin commented 2 years ago

我在3DMatch数据集上测试,遇到了Segmentation fault错误,为了方便描述,直接用中文了,见谅

环境: Ubuntu 18.04.5 LTS cuda 10.2 torch 1.8.1 open3d 0.15.2

执行命令: python eval_3dmatch.py --benchmark 3DMatch --data_root ./data/indoor/ --checkpoint NgeNet_weights/3dmatch.pth --saved_path work_dirs/3dmatch --no_cuda

定位在 ThreeDMatch.py 74行,执行normal的时候 src_pcd, tgt_pcd = normal(npy2pcd(src_points)), normal(npy2pcd(tgt_points))

normal的代码我做了注释,发现生成dataloader时调用执行正常,测试阶段会在执行pcd.estimate_normals时报错,执行结果代码贴在下面了

def normal(pcd, radius=0.1, max_nn=30, loc=(0, 0, 0)):
    print("before estimate_normals")
    pcd.estimate_normals(search_param=o3d.geometry.KDTreeSearchParamHybrid(radius=radius, max_nn=max_nn),
                         fast_normal_computation=False)
    print("after estimate_normals")
    pcd.orient_normals_towards_camera_location(loc)
    return pcd
rot 1623 <class 'list'>
trans 1623 <class 'list'>
src 1623 <class 'list'>
tgt 1623 <class 'list'>
overlap 1623 <class 'list'>
before estimate_normals
after estimate_normals
before estimate_normals
after estimate_normals
before estimate_normals
after estimate_normals
before estimate_normals
after estimate_normals
before estimate_normals
after estimate_normals
before estimate_normals
after estimate_normals
[36 35 36 38]
  0%|                                                                                                  | 0/1623 [00:00<?, ?it/s]
before estimate_normals  ## 开始执行normals,cpu模式报错,gpu模式会在这里卡住
ERROR: Unexpected segmentation fault encountered in worker.
  0%|                                                                                                  | 0/1623 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 986, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 104, in get
    if not self._poll(timeout):
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 257, in poll
    return self._poll(timeout)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 414, in _poll
    r = wait([self], timeout)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 911, in wait
    ready = selector.select(timeout)
  File "/usr/lib/python3.6/selectors.py", line 376, in select
    fd_event_list = self._poll.poll(timeout)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 22309) is killed by signal: Segmentation fault.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "eval_3dmatch.py", line 275, in <module>
    main(args)
  File "eval_3dmatch.py", line 89, in main
    for pair_ind, inputs in enumerate(tqdm(test_dataloader)):
  File "/usr/local/lib/python3.6/dist-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 517, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 1182, in _next_data
    idx, data = self._get_data()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 1148, in _get_data
    success, data = self._try_get_data()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 999, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 22309) exited unexpectedly
zhulf0804 commented 2 years ago

Hello,

One way you can try is to set config.num_workers = 0 in https://github.com/zhulf0804/NgeNet/blob/d4917f22e55195132ec6fc602554102d321ce4b5/eval_3dmatch.py#L50 However, it greatly influences the inference speed.

Another way you can try is to use Open3d V0.10.

Best regards.

zhirihuixin commented 2 years ago

With the Open3d V0.10 version, the problem is solved.

Thank you