open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5k stars 1.49k forks source link

Help! Unbalanced GPU memory usage in custom VoxelNext model. #2975

Open Mumuqiao opened 1 month ago

Mumuqiao commented 1 month ago

Prerequisite

Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmdetection3d

Environment

sys.platform: linux Python: 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0] CUDA available: True MUSA available: False numpy_random_seed: 2147483648 GPU 0,1,2,3,4,5,6,7: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.7, V11.7.64 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 PyTorch: 1.10.1 PyTorch compiling details: PyTorch built with:

TorchVision: 0.11.2 OpenCV: 4.9.0 MMEngine: 0.10.4 MMDetection: 3.3.0 MMDetection3D: 1.4.0+fe25f7a spconv2.0: True

Reproduces the problem - code sample

I'm trying to reproduce VoxelNeXt on mmdetection3d. I transferred the VoxelNeXt code in pull request 2692 to the main branch version of mmdetection3d and fixed bugs in their code. My code can be run successfully now, but I found that memory usage on each GPU is quite unbalanced. Is this phenomenon normal? I have tried to train CenterPoint on my machine, and the GPU memory usage on GPUs are balanced. Do you guys know what possible factors could cause this phenomenon?

Reproduces the problem - command or script

shell ./tools/dist_train.sh ./configs/voxelnext/voxelnext_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py 8

Reproduces the problem - error message

memory

Additional information

No response