I'm trying to reproduce VoxelNeXt on mmdetection3d. I transferred the VoxelNeXt code in pull request 2692 to the main branch version of mmdetection3d and fixed bugs in their code. My code can be run successfully now, but I found that memory usage on each GPU is quite unbalanced. Is this phenomenon normal?
I have tried to train CenterPoint on my machine, and the GPU memory usage on GPUs are balanced.
Do you guys know what possible factors could cause this phenomenon?
Prerequisite
Task
I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.
Branch
main branch https://github.com/open-mmlab/mmdetection3d
Environment
sys.platform: linux Python: 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0] CUDA available: True MUSA available: False numpy_random_seed: 2147483648 GPU 0,1,2,3,4,5,6,7: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.7, V11.7.64 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 PyTorch: 1.10.1 PyTorch compiling details: PyTorch built with:
TorchVision: 0.11.2 OpenCV: 4.9.0 MMEngine: 0.10.4 MMDetection: 3.3.0 MMDetection3D: 1.4.0+fe25f7a spconv2.0: True
Reproduces the problem - code sample
I'm trying to reproduce VoxelNeXt on mmdetection3d. I transferred the VoxelNeXt code in pull request 2692 to the main branch version of mmdetection3d and fixed bugs in their code. My code can be run successfully now, but I found that memory usage on each GPU is quite unbalanced. Is this phenomenon normal? I have tried to train CenterPoint on my machine, and the GPU memory usage on GPUs are balanced. Do you guys know what possible factors could cause this phenomenon?
Reproduces the problem - command or script
shell ./tools/dist_train.sh ./configs/voxelnext/voxelnext_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py 8
Reproduces the problem - error message
Additional information
No response