tusen-ai / SST

Code for a series of work in LiDAR perception, including SST (CVPR 22), FSD (NeurIPS 22), FSD++ (TPAMI 23), FSDv2, and CTRL (ICCV 23, oral).
Apache License 2.0
801 stars 102 forks source link

Setup environment and install dependencies #134

Closed ArseniuML closed 1 year ago

ArseniuML commented 1 year ago

I started to work with your project and want to successfully perform sh run.sh for the beginning.

I have Ubuntu 20.04 and Cuda compilation tools, release 11.1 (I can uprgade CUDA, but this would be painful for me...)

Could you provide me with a script to setup the environment (install dependencies with proper versions)? In that issue: https://github.com/tusen-ai/SST/issues/117 you have provided a setup script:

" ... pip install tensorflow==2.4.0 pip3 install waymo-open-dataset-tf-2-4-0 --user pip install torch-1.8.0+cu111-cp38-cp38-linux_x86_64.whl pip install torchvision-0.9.0+cu111-cp38-cp38-linux_x86_64.whl cd ./mmcv-1.3.9 ... "

but it seems that there is no folder mmcv-1.3.9 I have cloned SST repo.

Abyssaledge commented 1 year ago

Sorry but it is easy to get mmcv by: git clone git@github.com:open-mmlab/mmcv.git --branch v1.3.9

ArseniuML commented 1 year ago

(lidarenv) arseniy.marin@PC550-Ubuntu:~/Projects/Lidar/SST$ sh run.sh


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


Traceback (most recent call last): File "tools/train.py", line 16, in from mmdet3d.apis import train_model File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/apis/init.py", line 1, in from .inference import (convert_SyncBN, inference_detector, File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/apis/inference.py", line 10, in from mmdet3d.core import (Box3DMode, DepthInstance3DBoxes, File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/core/init.py", line 2, in from .bbox import * # noqa: F401, F403 File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/core/bbox/init.py", line 4, in from .iou_calculators import (AxisAlignedBboxOverlaps3D, BboxOverlaps3D, File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/core/bbox/iou_calculators/init.py", line 1, in from .iou3d_calculator import (AxisAlignedBboxOverlaps3D, BboxOverlaps3D, File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/core/bbox/iou_calculators/iou3d_calculator.py", line 5, in from ..structures import get_box_type File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/core/bbox/structures/init.py", line 1, in from .base_box3d import BaseInstance3DBoxes File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/core/bbox/structures/base_box3d.py", line 5, in from mmdet3d.ops.iou3d import iou3d_cuda File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/ops/init.py", line 5, in from .ball_query import ball_query File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/ops/ball_query/init.py", line 1, in from .ball_query import ball_query File "/home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/ops/ball_query/ball_query.py", line 4, in from . import ball_query_ext ImportError: /home/arseniy.marin@nami.local/Projects/Lidar/SST/mmdet3d/ops/ball_query/ball_query_ext.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at10TensorBase8data_ptrIfEEPT_v

Could this be due to CUDA 11.1 on my computer? If possible I want to stay with this version of CUDA.

Abyssaledge commented 1 year ago

I believe CUDA 11.1 is fine. Could you post the versions of all related libraries?

ArseniuML commented 1 year ago

after some reinstalls I have started "sh run.sh", but after some successful iteraitons it failed:

Traceback (most recent call last): train_model( File "tools/train.py", line 230, in File "/home/marin/Lidar/SST/mmdet3d/apis/train.py", line 41, in train_model Traceback (most recent call last): train_detector( File "tools/train.py", line 230, in File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/mmdet/apis/train.py", line 78, in train_detector main() File "tools/train.py", line 220, in main Traceback (most recent call last): File "tools/train.py", line 230, in model = MMDistributedDataParallel( File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 446, in init main() File "tools/train.py", line 220, in main train_model( File "/home/marin/Lidar/SST/mmdet3d/apis/train.py", line 41, in train_model main() File "tools/train.py", line 220, in main main() File "tools/train.py", line 220, in main train_detector(
self._sync_params_and_buffers(authoritative_rank=0) File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/mmdet/apis/train.py", line 78, in train_detector File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 457, in _sync_params_and_buffers

.....

  File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/mmdet/apis/train.py", line 78, in train_detector
train_model(model = MMDistributedDataParallel(    

train_detector( File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 446, in init File "/home/marin/Lidar/SST/mmdet3d/apis/train.py", line 41, in train_model File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/mmdet/apis/train.py", line 78, in train_detector self._distributed_broadcast_coalesced( model = MMDistributedDataParallel( model = MMDistributedDataParallel(
self._sync_params_and_buffers(authoritative_rank=0)dist._broadcast_coalesced(
File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1155, in _distributed_broadcast_coalesced File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 446, in init

train_detector( File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 446, in init File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 457, in _sync_params_and_buffers model = MMDistributedDataParallel( RuntimeError self._sync_params_and_buffers(authoritative_rank=0) File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/mmdet/apis/train.py", line 78, in train_detector : File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 446, in init

NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:825, invalid usage, NCCL version 2.7.8 ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc). File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 457, in _sync_params_and_buffers

...

Traceback (most recent call last): File "/home/marin/miniconda3/envs/lidar/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/marin/miniconda3/envs/lidar/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/home/marin/miniconda3/envs/lidar/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/home/marin/miniconda3/envs/lidar/bin/python3', '-u', 'tools/train.py', '--local_rank=7', 'configs/fsd/fsd_waymoD1_1x.py', '--launcher', 'pytorch', '--work-dir', './work_dirs/fsd_waymoD1_1x/', '--cfg-options', 'evaluation.pklfile_prefix=./work_dirs/fsd_waymoD1_1x/results', 'evaluation.metric=fast', '--seed', '1']' returned non-zero exit status 1.

ArseniuML commented 1 year ago

torch 1.8.0+cu111 torch-scatter 2.0.9 torchex 0.1.0 /home/marin/Lidar/TorchEx torchvision 0.9.0+cu111

mmcv-full 1.3.9 mmdet 2.14.0 mmdet3d 0.15.0 /home/marin/Lidar/SST mmsegmentation 0.14.1

waymo-open-dataset-tf-2-4-0 1.4.1 tensorflow 2.4.0 tensorflow-estimator 2.4.0

cumm-cu113 0.4.11 spconv-cu113 2.2.3

ArseniuML commented 1 year ago

maybe this needs to be posted as another issue

Abyssaledge commented 1 year ago

Please reopen this issue if you need further discussion.