wzzheng / OccWorld

[ECCV 2024] 3D World Model for Autonomous Driving
https://wzzheng.net/OccWorld/
Apache License 2.0
382 stars 25 forks source link

Visualization results are confusing. 🤔 #13

Open LMD0311 opened 11 months ago

LMD0311 commented 11 months ago

Thank you for your inspiring work. I try to reproduce the results, the avg val IoU is 26.15, and the avg val mIoU is 16.72, which are similar with the results from paper.

However, my visualization results make me confused. During the training, the Transformer get 1st-15th frames as inputs and predict 2nd-16th frames. Here are some visualization results.


The results are confusing. Even the reconstruction of 2nd and 3rd frames are not satisfying, and I cannot find any connection between them. @wzzheng Could authors provide any help?

LMD0311 commented 10 months ago

Thank you for your inspiring work. I try to reproduce the results, the avg val IoU is 26.15, and the avg val mIoU is 16.72, which are similar with the results from paper.

However, my visualization results make me confused. During the training, the Transformer get 1st-15th frames as inputs and predict 2nd-16th frames. Here are some visualization results.

  • 2nd GT: image
  • 3rd GT: image
  • 16th GT:
image
  • 2nd Predict:
image
  • 3rd Predict:
image
  • 16th Predict:
image

The results are confusing. Even the reconstruction of 2nd and 3rd frames are not satisfying, and I cannot find any connection between them. @wzzheng Could authors provide any help?

Visualization code comes from https://github.com/wzzheng/TPVFormer/blob/main/visualization/vis_frame.py.

Since the visualized GT is quite reasonable, I guess my visualization code works fine.

BTW, the predict Occ I chose comes from pred of https://github.com/wzzheng/OccWorld/blob/65658b16669493cc3f428bc615112bb22aede8f9/model/TransVQVAE.py#L168 the GT Occ I chose comes from output_dict['target_occs'] of
https://github.com/wzzheng/OccWorld/blob/65658b16669493cc3f428bc615112bb22aede8f9/model/TransVQVAE.py#L137