installation and run problem - Githubissues

tusen-ai / SST

Code for a series of work in LiDAR perception, including SST (CVPR 22), FSD (NeurIPS 22), FSD++ (TPAMI 23), FSDv2, and CTRL (ICCV 23, oral).

Apache License 2.0

761 stars 95 forks source link

installation and run problem #73

Open JessieW0806 opened 1 year ago

JessieW0806 commented 1 year ago

After when I run command run.sh, it gives the error as follows. importError: /mnt/cache/wangyingjie/SST/mmdet3d/ops/ball_query/ball_query_ext.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at10TensorBase8data_ptrIfEEPT_v

When install FSD, I refers to this: https://github.com/tusen-ai/SST/issues/6 My environment is as follows:

sys.platform: linux Python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:18) [GCC 10.3.0] CUDA available: True GPU 0,1,2,3,4,5,6,7: NVIDIA A100-SXM4-80GB CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.2.r11.2/compiler.29618528_0 GCC: gcc (GCC) 5.4.0 PyTorch: 1.9.0+cu111 PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
CuDNN 8.0.5
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.10.0+cu111 OpenCV: 4.6.0 MMCV: 1.3.9 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.14.0+2028b0c MMSegmentation: 0.14.1 MMDetection3D: 0.15.0

JessieW0806 commented 1 year ago

Looking forward to your reply!

Abyssaledge commented 1 year ago

Thanks for using. Such UndefinedSymbol error is likely caused by incompatible library versions. But I am not sure what goes wrong. Here is my log which contains the related information hope it helps. https://github.com/tusen-ai/SST/files/9689623/sst_waymoD5_1x_3class_8heads_v2.log

Abyssaledge commented 1 year ago

Here is the log fsd_waymoD1_1x.log. I delete the log you posted, which too long to post here. Could you upload it as a file?

JessieW0806 commented 1 year ago

Sorry for that. Here is my log. The value of loss seems to be wrong. Could u please help me out? log.txt

Abyssaledge commented 1 year ago

https://github.com/tusen-ai/SST/blob/main/configs/fsd/fsd_waymoD1_1x.py#L236 The batchsize you use is too large. You should scale the number (pos samples limit) along with your batchsize (~ 128 * batchsize)

JessieW0806 commented 1 year ago

Thanks for timely reply! I have changed this setting, however, the loss seems still be wrong. 2022-10-29 10:11:56,315 - mmdet - INFO - Epoch [1][50/79041] lr: 3.000e-05, eta: 8 days, 17:21:32, time: 0.795, data_time: 0.284, memory: 4435, loss_sem_seg: 0.0157, loss_vote: 0.8871, recall_Car: 1.0000, recall_Ped: 0.9800, recall_Cyc: 1.0000, num_clusters: 136.6600, num_fg_points: 427.8600, loss_cls.task0: 0.0096, loss_center.task0: 0.3746, loss_size.task0: 0.1888, loss_rot.task0: 0.0341, loss_cls.task1: 0.0154, loss_center.task1: 0.0000, loss_size.task1: 0.0000, loss_rot.task1: 0.0000, loss_cls.task2: 0.0169, loss_center.task2: 0.0000, loss_size.task2: 0.0000, loss_rot.task2: 0.0000, loss_rcnn_cls: 0.0435, num_pos_rois: 0.0000, num_neg_rois: 314.8600, loss_rcnn_bbox: 0.0000, loss_rcnn_corner: 0.0000, loss: 1.5857, grad_norm: 15.9475

The log txt log.txt is here. Looking forward to your reply.

Abyssaledge commented 1 year ago

Could you please point out which part of config you have modified? It's not easy to carefully check the modification you have made according to your log. And why you believe the loss is wrong?

JessieW0806 commented 1 year ago

1) I do not change any configs, just want to reproduce the result of FSD. 2) The loss starts at 1 and then goes up to 4. Besides, at the epoch 1, items such as loss_rcnn_bbox are 0. Thanks a lot!

Abyssaledge commented 1 year ago

It seems that you modify at least the number of GPUs according to your log, which changes the total batchsize and iterations. Since FSD use syncBN and a iteration-based warmup, it will make a significant difference on performance. So I suggest you check what you modified exactly.
Loss increase is reasonable because we enable the detection part after 4000 iterations.
Zero loss is also caused by you reducing the GPU number, so that 4000 iterations are not enough for a good segmentation warmup, and no foreground points are selected into the detection part.
I suggest users first go through the important parameters in configs before conducting experiments for better handling such issues.

JessieW0806 commented 1 year ago

Thanks for your help! I have another question. As you said "A hotfix is using our code to re-generate the waymo_dbinfo_train.pkl", I have processed Waymo data using the latest mmdet3d version. Do I need to re-generate only the waymo_dbinfo_train.pkl instead of re-processing the whole dataset?

Abyssaledge commented 1 year ago

You only need to re-generate waymo_dbinfo_train.pkl. If you know well about the format, you only need to modify the coordinates in this pickle file instead of regeneration. FYI, here are our pickles: https://share.weiyun.com/B3Ss4rid. You could compare them with your local data in case of unexpected bugs.

JessieW0806 commented 1 year ago

log.txt I have checked that my processed data are correct. I also make sure the code is the same as your update. However, the log file is still a lot different from your earlier reply (e.g., strange increase), could you please check it, please?

Abyssaledge commented 1 year ago

You could try an experiment without dbsampler. I doubt there is something wrong with it.

JessieW0806 commented 1 year ago

log2.txt I have tested the experiment without dbsampler yesterday, it seems to have the same error.

Abyssaledge commented 1 year ago

Send me an email and I will offer you the trained checkpoint. You could use it for inference to see if the results match.

Abyssaledge commented 1 year ago

I have sent it to you.

JessieW0806 commented 1 year ago

res.txt The output results seem to be fine.... Why is the training of the FSD network from scratch not correct? Could you please help me with it?

Abyssaledge commented 1 year ago

It's hard to say what is going wrong. You could list the detailed procedure you followed, including gata generation, the code you use, and modifications you made if any. I will try to help.

JessieW0806 commented 1 year ago

Thanks for your reply! I re-install the environment, and the training process seems to be ok, but I still get some questions to ask for your opinion. 1) when I use the original config, the whole waymo dataset will be used. The training time is 4 days which is much slower than your log. I use the same setting as you provided with 8 A100 gpus and trained on a cluster srun -p ai4science --async --job-name=FSD_s --gres=gpu:8 --ntasks=8 --ntasks-per-node=8 --cpus-per-task=8 --kill-on-bad-exit=1 bash run2.sh What do you guess is the reason for the difference in training time?

2) If I want to increase samples_per_gpu to 8 (for example), what else do I need to change to get the best performance?

3) I want to use part of the data set for experiments in the future (one-fifth), so it is OK？

Abyssaledge commented 1 year ago

Sorry for late reply. How is it going now?

I don't know, it's hard to say. Check the IO or timing to find the bottleneck.
Increase learning rate and increase the num in IoUNegPiecewiseSampler along with batch size.
If you realize this by setting the load_interval, it will be fine.