Two-stage training evaluation

alfredgu001324 commented 1 year ago

Hi, I have trained two models (stage 1 and stage 2) according to your instruction. However, when I run this command CUDA_VISIBLE_DEVICES=0 python tools/test.py /home/guxunjia/Desktop/VAD/projects/configs/VAD/VAD_base_stage_2.py /home/guxunjia/Desktop/VAD/work_dirs/VAD_base_stage_2/epoch_12.pth --launcher none --eval bbox --tmpdir tmp

The following error occur:

projects.mmdet3d_plugin WARNING!!!!, Only can be used for obtain inference speed!!!! load checkpoint from local path: /home/guxunjia/Desktop/VAD/work_dirs/VAD_base_stage_2/epoch_12.pth [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 81/81, 5.0 task/s, elapsed: 16s, ETA: 0sTraceback (most recent call last): File "tools/test.py", line 294, in <module> main() File "tools/test.py", line 274, in main print(dataset.evaluate(outputs['bbox_results'], **eval_kwargs)) File "/home/guxunjia/Desktop/VAD/projects/mmdet3d_plugin/datasets/nuscenes_vad_dataset.py", line 1786, in evaluate result_dict['ADE_'+cls] = all_metric_dict['ADE_'+cls] / all_metric_dict['cnt_ade_'+cls] ZeroDivisionError: float division by zero

I am using the mini dataset for training and eval. I checked the 'all_metric_dict' and it shows the following:

(Pdb) all_metric_dict {'gt_car': 701.0, 'gt_pedestrian': 659.0, 'cnt_ade_car': 0.0, 'cnt_ade_pedestrian': 2.0, 'cnt_fde_car': 0.0, 'cnt_fde_pedestrian': 0.0, 'hit_car': 0.0, 'hit_pedestrian': 0.0, 'fp_car': 46.0, 'fp_pedestrian': 0.0, 'ADE_car': 0.0, 'ADE_pedestrian': tensor(2.8956), 'FDE_car': 0.0, 'FDE_pedestrian': 0.0, 'MR_car': 0.0, 'MR_pedestrian': 0.0}

I am wondering if this is normal since I am using the mini dataset for training (just trying out), and it will affect the performance of the model and leads to these zeros values. When I am using your checkpoint model, everything is fine. I am wondering what is the correct file/procedure for evaluating the two-stage model?

Thank you so much!

alfredgu001324 commented 1 year ago

I also tried training end-to-end on the mini dataset, the problem still occurs. So I am guessing that the small dataset is not enough to train at least some minimal results for exploring?

rb93dett commented 1 year ago

Yes, the mini dataset of nuScenes is very small, and I think the original epoch configs are insufficient for convergence. If you want to try on the mini dataset, I think at least a much larger epoch is required.

yuyuyuyuyuty commented 4 months ago

Hello, when i run the test.py ,I found that you encountered such resultdict['ADE'+cls] = all_metricdict['ADE'+cls] / all_metric_dict['cntade'+cls] ZeroDivisionError: float division by zero, so did I. Have you solved it?

alfredgu001324 commented 4 months ago

@yuyuyuyuyuty I think the above explanation is the reason.

yuyuyuyuyuty commented 4 months ago

@yuyuyuyuyuty我认为上述解释就是原因。 I have run the original 60epoch set in the paper, do I still have to increase the epoch?

hustvl / VAD

Two-stage training evaluation #15