wzzheng / TPVFormer

[CVPR 2023] An academic alternative to Tesla's occupancy network for autonomous driving.
https://wzzheng.net/TPVFormer/
Apache License 2.0
1.16k stars 105 forks source link

Higher inference resolution #56

Closed amundra15 closed 11 months ago

amundra15 commented 11 months ago

Hi,

Thanks for releasing this amazing work. I am curious about the interpolation capabilities of the method at test time. If I understand it correctly, the training voxel resolution was 100x100x8, but the inference resolution can be anything.

However, when upon increasing the resolution (as below) tpv_h_ = 200 tpv_w_ = 200 tpv_z_ = 16 scale_h = 1 scale_w = 1 scale_z = 1 I get an error: RuntimeError: Error(s) in loading state_dict for TPVFormer: size mismatch for tpv_head.tpv_mask_hw: copying a param with shape torch.Size([1, 100, 100]) from checkpoint, the shape in current model is torch.Size([1, 200, 200]). size mismatch for tpv_head.positional_encoding.row_embed.weight: copying a param with shape torch.Size([100, 128]) from checkpoint, the shape in current model is torch.Size([200, 128]). size mismatch for tpv_head.positional_encoding.col_embed.weight: copying a param with shape torch.Size([100, 128]) from checkpoint, the shape in current model is torch.Size([200, 128]). size mismatch for tpv_head.encoder.ref_3d_hw: copying a param with shape torch.Size([1, 4, 10000, 3]) from checkpoint, the shape in current model is torch.Size([1, 4, 40000, 3]). size mismatch for tpv_head.encoder.ref_3d_zh: copying a param with shape torch.Size([1, 32, 800, 3]) from checkpoint, the shape in current model is torch.Size([1, 32, 3200, 3]). size mismatch for tpv_head.encoder.ref_3d_wz: copying a param with shape torch.Size([1, 32, 800, 3]) from checkpoint, the shape in current model is torch.Size([1, 32, 3200, 3]). size mismatch for tpv_head.encoder.ref_2d_hw: copying a param with shape torch.Size([1, 10000, 1, 2]) from checkpoint, the shape in current model is torch.Size([1, 40000, 1, 2]). size mismatch for tpv_head.encoder.ref_2d_zh: copying a param with shape torch.Size([1, 800, 1, 2]) from checkpoint, the shape in current model is torch.Size([1, 3200, 1, 2]). size mismatch for tpv_head.encoder.ref_2d_wz: copying a param with shape torch.Size([1, 800, 1, 2]) from checkpoint, the shape in current model is torch.Size([1, 3200, 1, 2]). size mismatch for tpv_head.tpv_embedding_hw.weight: copying a param with shape torch.Size([10000, 256]) from checkpoint, the shape in current model is torch.Size([40000, 256]). size mismatch for tpv_head.tpv_embedding_zh.weight: copying a param with shape torch.Size([800, 256]) from checkpoint, the shape in current model is torch.Size([3200, 256]). size mismatch for tpv_head.tpv_embedding_wz.weight: copying a param with shape torch.Size([800, 256]) from checkpoint, the shape in current model is torch.Size([3200, 256]).

It seems like some of the modules can not handle the changed output size. Could you guide me on how to achieve variable inference resolution?

amundra15 commented 11 months ago

I figured out that I need to change the scale* parameters and not tpv* ones. Closing the issue.