Thanks for sharing your codes! We found the Maple proposed in your paper may not be directly applicable to some VLMs using CNN-based encoders, such as ResNets. I just come up with a feasible solution to use Maple with CNN-based VLMs. The diagram is shown as below.
Hi~
Thanks for sharing your codes! We found the Maple proposed in your paper may not be directly applicable to some VLMs using CNN-based encoders, such as ResNets. I just come up with a feasible solution to use Maple with CNN-based VLMs. The diagram is shown as below.