isl-org / MiDaS

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
MIT License
4.25k stars 597 forks source link

Copy of MiDaS model made with `copy.deepcopy` does not work. #251

Open charliebudd opened 7 months ago

charliebudd commented 7 months ago

If a copy is made of a Midas model loaded through torch hub, the forward call on the copy will throw an error related to missing keys in parts of the model. See https://github.com/isl-org/MiDaS/issues/247.

  File "test.py", line 45, in <module>
    depths = depth_model(images)
  File "###/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "###/.cache/torch/hub/intel-isl_MiDaS_master/midas/dpt_depth.py", line 166, in forward
    return super().forward(x).squeeze(dim=1)
  File "/###/.cache/torch/hub/intel-isl_MiDaS_master/midas/dpt_depth.py", line 114, in forward
    layers = self.forward_transformer(self.pretrained, x)
  File "###/.cache/torch/hub/intel-isl_MiDaS_master/midas/backbones/vit.py", line 13, in forward_vit
    return forward_adapted_unflatten(pretrained, x, "forward_flex")
  File "###/.cache/torch/hub/intel-isl_MiDaS_master/midas/backbones/utils.py", line 88, in forward_adapted_unflatten
    layer_1 = pretrained.activations["1"]
KeyError: '1'
heyoeyo commented 7 months ago

This likely has to do with how the image embeddings are pulled from the image encoder model, which relies on pytorch's register_forward_hook module method. The image encoder models are setup to use the forward hooks to dump the image embeddings into a activations dictionary, which isn't part of the model definition (I'm not sure the forward hooks are part of the model either).

In any case, when you make a copy of the model, you won't get the activation dictionary (and/or forward hooks?), which seems to be the cause of the error you're seeing. Depending on how you use the copy, you may be able to simply re-add the forward hooks before running the model to get it to work properly. How you do this depends on the model variant, it looks like the swin implementation is here, the beit implementation is here, and the vit version (with resnet-50 it looks like) is here.