isl-org / MiDaS

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
MIT License
4.25k stars 597 forks source link

swin2_tiny failed to run forward(): RuntimeError: unflatten: Provided sizes [64, 64] don't multiply up to the size of dim 2 (64) in the input tensor. #252

Open yqsony opened 7 months ago

yqsony commented 7 months ago

I tried with

model = DPTDepthModel(
            path=None,
            backbone="swin2t16_256",
            non_negative=True,
        )

During inference at https://github.com/isl-org/MiDaS/blob/bdc4ed64c095e026dc0a2f17cabb14d58263decb/midas/backbones/utils.py#L72 it gave the error

RuntimeError: unflatten: Provided sizes [64, 64] don't multiply up to the size of dim 2 (64) in the input tensor

The input at this layer is of a shape (b, 64, 64, 96) where b is the batch size. The next operator pretrained.act_postprocess1 is a

Sequential(
  (0): Transpose()
  (1): Unflatten(dim=2, unflattened_size=torch.Size([64, 64]))
)

I don't think Unflatten(dim=2, unflattened_size=torch.Size([64, 64])) work on any of the dimensions (b, 64, 64, 96). On the other hand it seems (b, 64, 64, 96) has already been unflattened.

Did anyone tried training or inference with the swin backbones?

NielsRogge commented 6 months ago

Hi,

See #259 for easy inference with DPT + Swin backbone