Open abubake opened 3 months ago
Same problem.
hi @ymlab @abubake,
I have a question regarding the training:
I am curious how much time the training takes per epoch and how many gpus do you use? I am particularly interested in the lidar only training if you have any experience with that.
Hi, training with 4 gpu’s took several hours per epoch, both for camera and when I tried with lidar only. I don’t remember the exact time per epoch, but it was about 4-5 days for 20 epochs. Which is roughly 4.5 to 6 hours per epoch.
On Wed, Oct 9, 2024 at 5:34 PM Görkem Güzeler @.***> wrote:
hi @ymlab https://github.com/ymlab @abubake https://github.com/abubake ,
I have a question regarding the training:
I am curious how much time the training takes per epoch and how many gpus do you use? I am particularly interested in the lidar only training if you have any experience with that.
— Reply to this email directly, view it on GitHub https://github.com/open-mmlab/mmdetection3d/issues/3024#issuecomment-2403469447, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWNVWHXYCSWYDM4OI5DWZLZ2WOQDAVCNFSM6AAAAABM2JY5YWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBTGQ3DSNBUG4 . You are receiving this because you were mentioned.Message ID: @.***>
Thanks a lot for sharing your experience @abubake, it helps! were you able to reproduce good results (comparable to the paper) with lidar only training?
I plan to work with this repository for my thesis, and don't want to waste time if the code is not working as expected. therefore any feedback is valuable for me :)
@gorkemguzeler the repo is working as expected for me. Haven't trained lidar-only but I got 65 mAP after 3 epochs of training the bevfusion model with the lidar-only base. Oh and it took 2h per epoch on 8x 3090 with bs 2 and lr scaling enabled.
Btw we are in the same boat. I am also doing my thesis on multimodal learning :)
@mdessl Hi I'm also working on multimodal 3d det. I'm curious by bs 2 you mean 2 batch per GPU or 2 batch for the whole 8 GPUs? As 3080 seems only have 12G of mem. I have trained the BEVFusion of this repo on 2xA5000 with bs of 4 (with lr scale) and cannot match the result of 71.4 NDS. After using Gradient Accumulation to simulate bs 32, the performance is much better to approximately 70.9 NDS.
For the multimodal, my concern is that the camera branch of this repo is too dependent on LiDAR, as they use DepthLSS instead of original LSS transform.
@curiosity654 ohh sry it was a typo. I meant 3090 (24G RAM), so bs 2 per GPU.
Do you think the issue could have to do with the batchnorm layers? I think BN is not so compatible with gradient accumulation and I am not sure what you could do about it.
@mdessl , thanks a lot for the feedback 👍
oh, good luck on your thesis :)
Prerequisite
Task
I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.
Branch
main branch https://github.com/open-mmlab/mmdetection3d
Environment
sys.platform: linux Python: 3.10.14 (main, Jul 8 2024, 14:50:49) [GCC 12.3.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0,1,2,3: NVIDIA GeForce GTX 1080 Ti CUDA_HOME: /usr/local/cuda-12.1 NVCC: Cuda compilation tools, release 12.1, V12.1.66 GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 PyTorch: 2.1.2+cu121 PyTorch compiling details: PyTorch built with:
TorchVision: 0.16.2+cu121 OpenCV: 4.9.0 MMEngine: 0.10.2 MMDetection: 3.3.0 MMDetection3D: 1.4.0+161d091 spconv2.0: False
Reproduces the problem - code sample
Reproduces the problem - command or script
bash tools/dist_train.sh projects/BEVFusion/configs/bevfusion_cam_swint_centerpoint_nus-3d.py 4
Reproduces the problem - error message
No error message; issue is even after 20 epochs, the result is extremely poor mAP and NMS. Loss gets down to about 6.x.
Additional information