Open charliebudd opened 1 year ago
This likely has to do with how the image embeddings are pulled from the image encoder model, which relies on pytorch's register_forward_hook module method. The image encoder models are setup to use the forward hooks to dump the image embeddings into a activations
dictionary, which isn't part of the model definition (I'm not sure the forward hooks are part of the model either).
In any case, when you make a copy of the model, you won't get the activation dictionary (and/or forward hooks?), which seems to be the cause of the error you're seeing. Depending on how you use the copy, you may be able to simply re-add the forward hooks before running the model to get it to work properly. How you do this depends on the model variant, it looks like the swin implementation is here, the beit implementation is here, and the vit version (with resnet-50 it looks like) is here.
If a copy is made of a Midas model loaded through torch hub, the forward call on the copy will throw an error related to missing keys in parts of the model. See https://github.com/isl-org/MiDaS/issues/247.