Visualization results are confusing. 🤔

Thank you for your inspiring work. I try to reproduce the results, the avg val IoU is 26.15, and the avg val mIoU is 16.72, which are similar with the results from paper.

However, my visualization results make me confused. During the training, the Transformer get 1st-15th frames as inputs and predict 2nd-16th frames. Here are some visualization results.

2nd GT:

3rd GT:

16th GT:

2nd Predict:

3rd Predict:

16th Predict:

The results are confusing. Even the reconstruction of 2nd and 3rd frames are not satisfying, and I cannot find any connection between them. @wzzheng Could authors provide any help?

Visualization code comes from https://github.com/wzzheng/TPVFormer/blob/main/visualization/vis_frame.py.

Since the visualized GT is quite reasonable, I guess my visualization code works fine.

BTW, the predict Occ I chose comes from pred of https://github.com/wzzheng/OccWorld/blob/65658b16669493cc3f428bc615112bb22aede8f9/model/TransVQVAE.py#L168 the GT Occ I chose comes from output_dict['target_occs'] of
https://github.com/wzzheng/OccWorld/blob/65658b16669493cc3f428bc615112bb22aede8f9/model/TransVQVAE.py#L137

wzzheng / OccWorld

Visualization results are confusing. 🤔 #13