Closed wr-1999 closed 1 year ago
Hi @wr-1999,
Thank you for your message.
Currently, our MaPLe architecture is explicitly tailored to adapt Transformers based Vision-Language (V-L) models. In our project, we did not explore using MaPLe for ResNet type V-L models.
However, there are some ways to adapt ResNet like architectures with learnable prompts. For example, you can train a grid-like learnable prompt frame over the image in ResNets.
You can see the following related papers which has explored prompt learning on ResNet like architectures. They can be modified and extended with a MaPLe design.
Kindly let us know if you have any further queries.
Thank you and kind regards.
I am closing the issue. Please feel free to open in-case there are any additional queries.
Thank you.
Would it be feasible to replace the network with ResNet