zeliu98 / Group-Free-3D

Group-Free 3D Object Detection via Transformers
MIT License
243 stars 33 forks source link

Question about results reproduction #19

Closed yikaiw closed 3 years ago

yikaiw commented 3 years ago

Hi, thanks for the nice work.

I train your network on SUN RGBD dataset with the training script: CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --master_port 2222 --nproc_per_node 4 train_dist.py --max_epoch 600 --lr_decay_epochs 420 480 540 --num_point 20000 --num_decoder_layers 6 --size_cls_agnostic --size_delta 0.0625 --heading_delta 0.04 --center_delta 0.1111111111111 --learning_rate 0.004 --decoder_learning_rate 0.0002 --weight_decay 0.00000001 --query_points_generator_loss_coef 0.2 --obj_loss_coef 0.4 --dataset sunrgbd --data_root .

I obtain the following results:

[08/24 23:51:00 group-free]: IoU[0.25]: 0head_: 0.6363  1head_: 0.6320  2head_: 0.6202  3head_: 0.6132  4head_: 0.6163  last_: 0.6164   proposal_: 0.6108
[08/24 23:51:00 group-free]: IoU[0.5]:  0head_: 0.4328  1head_: 0.4388  2head_: 0.4095  3head_: 0.4329  4head_: 0.4441  last_: 0.4282   proposal_: 0.3599

Question 1: There are several results (0head, 1head, 2head, 3head, 4head, proposal), and which one is proper to be reported in the paper? Question 2: These results are not very comparable to the results in your paper (IoU[0.25] 63.0, IoU[0.5] 45.2). I'm not sure what's going wrong.

Thank you and look forward for your reply.

zeliu98 commented 3 years ago

Hi, 0head-last means the performance of iterative box prediction with different decoder layers. And we get the final results by multi-stage ensemble, you should run the following command to get it after training:

python eval_avg.py --num_point 20000 --num_decoder_layers 6 --size_cls_agnostic \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset sunrgbd --data_root <data directory> [--dump_dir <dump directory>]
yikaiw commented 3 years ago

Thank you for your reply!

Following your evaluation script, I got multi-stage evaluation results:

[09/18 16:08:07 eval]: T[1] IoU[0.25]: 0head_: 0.6197   1head_: 0.6151  2head_: 0.6071  3head_: 0.5975  4head_: 0.5977  all_layers_: 0.6304     last_: 0.5943  proposal_: 0.5950
[09/18 16:08:07 eval]: T[1] IoU[0.5]: 0head_: 0.4162    1head_: 0.4170  2head_: 0.4199  3head_: 0.4222  4head_: 0.4176  all_layers_: 0.4412     last_: 0.4211  proposal_: 0.3716

Should results with the prefix "alllayers" be the final (reported) results?

Results with the prefix "alllayers" are IoU[0.25] 63.0, IoU[0.5] 44.1, which are now seem to be comparable to your results (IoU[0.25] 63.0, IoU[0.5] 45.2). I guess the small gap (1.1) of IoU[0.5] is due to the environment difference or other issues.

HaniItani commented 2 years ago

Hello @zeliu98,

Is "alllayers" always the best performing head? I ran evaluation of the provided pretrained model of GroupFree3D (12L, 512, PointNet++ wx2) only one time on ScanNet, and I get the following:

[01/11 21:42:17 eval]: AVG IoU[0.25]:   0head_: 0.6506  10head_: 0.6888         1head_: 0.6804  2head_: 0.6864  3head_: 0.6872  4head_: 0.6899  5head_: 0.6852   6head_: 0.6937  7head_: 0.6897  8head_: 0.6905  9head_: 0.6901  all_layers_: 0.6705     last_: 0.6942   last_three_: 0.6901     proposal_: 0.6262
[01/11 21:42:17 eval]: AVG IoU[0.5]:    0head_: 0.4213  10head_: 0.5152         1head_: 0.4702  2head_: 0.4917  3head_: 0.4904  4head_: 0.5115  5head_: 0.5059   6head_: 0.5197  7head_: 0.5114  8head_: 0.5210  9head_: 0.5157  all_layers_: 0.4701     last_: 0.5215   last_three_: 0.5186     proposal_: 0.3765

It seems for this model variant, taking the "last_" only is the best performing head.