Thanks for releasing this amazing work. I am curious about the interpolation capabilities of the method at test time. If I understand it correctly, the training voxel resolution was 100x100x8, but the inference resolution can be anything.
However, when upon increasing the resolution (as below)
tpv_h_ = 200 tpv_w_ = 200 tpv_z_ = 16 scale_h = 1 scale_w = 1 scale_z = 1
I get an error:
RuntimeError: Error(s) in loading state_dict for TPVFormer: size mismatch for tpv_head.tpv_mask_hw: copying a param with shape torch.Size([1, 100, 100]) from checkpoint, the shape in current model is torch.Size([1, 200, 200]). size mismatch for tpv_head.positional_encoding.row_embed.weight: copying a param with shape torch.Size([100, 128]) from checkpoint, the shape in current model is torch.Size([200, 128]). size mismatch for tpv_head.positional_encoding.col_embed.weight: copying a param with shape torch.Size([100, 128]) from checkpoint, the shape in current model is torch.Size([200, 128]). size mismatch for tpv_head.encoder.ref_3d_hw: copying a param with shape torch.Size([1, 4, 10000, 3]) from checkpoint, the shape in current model is torch.Size([1, 4, 40000, 3]). size mismatch for tpv_head.encoder.ref_3d_zh: copying a param with shape torch.Size([1, 32, 800, 3]) from checkpoint, the shape in current model is torch.Size([1, 32, 3200, 3]). size mismatch for tpv_head.encoder.ref_3d_wz: copying a param with shape torch.Size([1, 32, 800, 3]) from checkpoint, the shape in current model is torch.Size([1, 32, 3200, 3]). size mismatch for tpv_head.encoder.ref_2d_hw: copying a param with shape torch.Size([1, 10000, 1, 2]) from checkpoint, the shape in current model is torch.Size([1, 40000, 1, 2]). size mismatch for tpv_head.encoder.ref_2d_zh: copying a param with shape torch.Size([1, 800, 1, 2]) from checkpoint, the shape in current model is torch.Size([1, 3200, 1, 2]). size mismatch for tpv_head.encoder.ref_2d_wz: copying a param with shape torch.Size([1, 800, 1, 2]) from checkpoint, the shape in current model is torch.Size([1, 3200, 1, 2]). size mismatch for tpv_head.tpv_embedding_hw.weight: copying a param with shape torch.Size([10000, 256]) from checkpoint, the shape in current model is torch.Size([40000, 256]). size mismatch for tpv_head.tpv_embedding_zh.weight: copying a param with shape torch.Size([800, 256]) from checkpoint, the shape in current model is torch.Size([3200, 256]). size mismatch for tpv_head.tpv_embedding_wz.weight: copying a param with shape torch.Size([800, 256]) from checkpoint, the shape in current model is torch.Size([3200, 256]).
It seems like some of the modules can not handle the changed output size. Could you guide me on how to achieve variable inference resolution?
Hi,
Thanks for releasing this amazing work. I am curious about the interpolation capabilities of the method at test time. If I understand it correctly, the training voxel resolution was 100x100x8, but the inference resolution can be anything.
However, when upon increasing the resolution (as below)
tpv_h_ = 200 tpv_w_ = 200 tpv_z_ = 16 scale_h = 1 scale_w = 1 scale_z = 1
I get an error:RuntimeError: Error(s) in loading state_dict for TPVFormer: size mismatch for tpv_head.tpv_mask_hw: copying a param with shape torch.Size([1, 100, 100]) from checkpoint, the shape in current model is torch.Size([1, 200, 200]). size mismatch for tpv_head.positional_encoding.row_embed.weight: copying a param with shape torch.Size([100, 128]) from checkpoint, the shape in current model is torch.Size([200, 128]). size mismatch for tpv_head.positional_encoding.col_embed.weight: copying a param with shape torch.Size([100, 128]) from checkpoint, the shape in current model is torch.Size([200, 128]). size mismatch for tpv_head.encoder.ref_3d_hw: copying a param with shape torch.Size([1, 4, 10000, 3]) from checkpoint, the shape in current model is torch.Size([1, 4, 40000, 3]). size mismatch for tpv_head.encoder.ref_3d_zh: copying a param with shape torch.Size([1, 32, 800, 3]) from checkpoint, the shape in current model is torch.Size([1, 32, 3200, 3]). size mismatch for tpv_head.encoder.ref_3d_wz: copying a param with shape torch.Size([1, 32, 800, 3]) from checkpoint, the shape in current model is torch.Size([1, 32, 3200, 3]). size mismatch for tpv_head.encoder.ref_2d_hw: copying a param with shape torch.Size([1, 10000, 1, 2]) from checkpoint, the shape in current model is torch.Size([1, 40000, 1, 2]). size mismatch for tpv_head.encoder.ref_2d_zh: copying a param with shape torch.Size([1, 800, 1, 2]) from checkpoint, the shape in current model is torch.Size([1, 3200, 1, 2]). size mismatch for tpv_head.encoder.ref_2d_wz: copying a param with shape torch.Size([1, 800, 1, 2]) from checkpoint, the shape in current model is torch.Size([1, 3200, 1, 2]). size mismatch for tpv_head.tpv_embedding_hw.weight: copying a param with shape torch.Size([10000, 256]) from checkpoint, the shape in current model is torch.Size([40000, 256]). size mismatch for tpv_head.tpv_embedding_zh.weight: copying a param with shape torch.Size([800, 256]) from checkpoint, the shape in current model is torch.Size([3200, 256]). size mismatch for tpv_head.tpv_embedding_wz.weight: copying a param with shape torch.Size([800, 256]) from checkpoint, the shape in current model is torch.Size([3200, 256]).
It seems like some of the modules can not handle the changed output size. Could you guide me on how to achieve variable inference resolution?