Closed ArseniuML closed 1 year ago
Sorry but it is easy to get mmcv by: git clone git@github.com:open-mmlab/mmcv.git --branch v1.3.9
(lidarenv) arseniy.marin@PC550-Ubuntu:~/Projects/Lidar/SST$ sh run.sh
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Traceback (most recent call last): File "tools/train.py", line 16, in
from mmdet3d.apis import train_model File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/apis/init.py", line 1, in from .inference import (convert_SyncBN, inference_detector, File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/apis/inference.py", line 10, in from mmdet3d.core import (Box3DMode, DepthInstance3DBoxes, File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/core/init.py", line 2, in from .bbox import * # noqa: F401, F403 File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/core/bbox/init.py", line 4, in from .iou_calculators import (AxisAlignedBboxOverlaps3D, BboxOverlaps3D, File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/core/bbox/iou_calculators/init.py", line 1, in from .iou3d_calculator import (AxisAlignedBboxOverlaps3D, BboxOverlaps3D, File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/core/bbox/iou_calculators/iou3d_calculator.py", line 5, in from ..structures import get_box_type File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/core/bbox/structures/init.py", line 1, in from .base_box3d import BaseInstance3DBoxes File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/core/bbox/structures/base_box3d.py", line 5, in from mmdet3d.ops.iou3d import iou3d_cuda File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/ops/init.py", line 5, in from .ball_query import ball_query File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/ops/ball_query/init.py", line 1, in from .ball_query import ball_query File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/ops/ball_query/ball_query.py", line 4, in from . import ball_query_ext ImportError: /home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/ops/ball_query/ball_query_ext.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at10TensorBase8data_ptrIfEEPT_v
Could this be due to CUDA 11.1 on my computer? If possible I want to stay with this version of CUDA.
I believe CUDA 11.1 is fine. Could you post the versions of all related libraries?
after some reinstalls I have started "sh run.sh", but after some successful iteraitons it failed:
Traceback (most recent call last):
train_model(
File "tools/train.py", line 230, in
self._sync_params_and_buffers(authoritative_rank=0)
File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/mmdet/apis/train.py", line 78, in train_detector
File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 457, in _sync_params_and_buffers
.....
File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/mmdet/apis/train.py", line 78, in train_detector
train_model(model = MMDistributedDataParallel(
train_detector(
File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 446, in init
File "/home/marin/Lidar/SST/mmdet3d/apis/train.py", line 41, in train_model
File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/mmdet/apis/train.py", line 78, in train_detector
self._distributed_broadcast_coalesced( model = MMDistributedDataParallel(
model = MMDistributedDataParallel(
self._sync_params_and_buffers(authoritative_rank=0)dist._broadcast_coalesced(
File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1155, in _distributed_broadcast_coalesced
File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 446, in init
train_detector( File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 446, in init File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 457, in _sync_params_and_buffers model = MMDistributedDataParallel( RuntimeError self._sync_params_and_buffers(authoritative_rank=0) File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/mmdet/apis/train.py", line 78, in train_detector : File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 446, in init
NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:825, invalid usage, NCCL version 2.7.8 ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc). File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 457, in _sync_params_and_buffers
...
Traceback (most recent call last):
File "/home/marin/miniconda3/envs/lidar/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/marin/miniconda3/envs/lidar/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in
torch 1.8.0+cu111 torch-scatter 2.0.9 torchex 0.1.0 /home/marin/Lidar/TorchEx torchvision 0.9.0+cu111
mmcv-full 1.3.9 mmdet 2.14.0 mmdet3d 0.15.0 /home/marin/Lidar/SST mmsegmentation 0.14.1
waymo-open-dataset-tf-2-4-0 1.4.1 tensorflow 2.4.0 tensorflow-estimator 2.4.0
cumm-cu113 0.4.11 spconv-cu113 2.2.3
maybe this needs to be posted as another issue
Please reopen this issue if you need further discussion.
I started to work with your project and want to successfully perform sh run.sh for the beginning.
I have Ubuntu 20.04 and Cuda compilation tools, release 11.1 (I can uprgade CUDA, but this would be painful for me...)
Could you provide me with a script to setup the environment (install dependencies with proper versions)? In that issue: https://github.com/tusen-ai/SST/issues/117 you have provided a setup script:
" ... pip install tensorflow==2.4.0 pip3 install waymo-open-dataset-tf-2-4-0 --user pip install torch-1.8.0+cu111-cp38-cp38-linux_x86_64.whl pip install torchvision-0.9.0+cu111-cp38-cp38-linux_x86_64.whl cd ./mmcv-1.3.9 ... "
but it seems that there is no folder mmcv-1.3.9 I have cloned SST repo.