muzairkhattak / multimodal-prompt-learning

[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".
https://muzairkhattak.github.io/multimodal-prompt-learning/
MIT License
619 stars 43 forks source link

About the Network #24

Closed wr-1999 closed 1 year ago

wr-1999 commented 1 year ago

Would it be feasible to replace the network with ResNet

muzairkhattak commented 1 year ago

Hi @wr-1999,

Thank you for your message.

Currently, our MaPLe architecture is explicitly tailored to adapt Transformers based Vision-Language (V-L) models. In our project, we did not explore using MaPLe for ResNet type V-L models.

However, there are some ways to adapt ResNet like architectures with learnable prompts. For example, you can train a grid-like learnable prompt frame over the image in ResNets.

You can see the following related papers which has explored prompt learning on ResNet like architectures. They can be modified and extended with a MaPLe design.

  1. Visual Prompt Tuning (paper link)
  2. Exploring Visual Prompts for Adapting Large-Scale Models (paper link)

Kindly let us know if you have any further queries.

Thank you and kind regards.

muzairkhattak commented 1 year ago

I am closing the issue. Please feel free to open in-case there are any additional queries.

Thank you.