muzairkhattak / multimodal-prompt-learning

[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".
https://muzairkhattak.github.io/multimodal-prompt-learning/
MIT License
619 stars 43 forks source link

Training maple on multi-class (image-caption) dataset #9

Closed AhmedBourouis closed 1 year ago

AhmedBourouis commented 1 year ago

Thank you for this great work and the clear detailed implementation I was wondering if it's possible to train maple on "scene" images like MS-COCO images. 1- What would be an appropriate preprocessing steps? I noticed that in all train/test datasets you worked with you have one folder per class. That won't be possible to duplicate in the case of multi-class images. 2- What changes can be made in the code to adapt the model on multi-class classification during training ? Thank you again for this amazing contribution!

muzairkhattak commented 1 year ago

HI @AhmedBourouis,

Thank you for showing interest in our work.

Yes it is possible to train MaPLe on image-caption pair dataset like COCO-Captions dataset.

1- What would be an appropriate preprocessing steps? I noticed that in all train/test datasets you worked with you have one folder per class. That won't be possible to duplicate in the case of multi-class images.

You do not need to manually do the folder preprocessing. Mainly you would need to implement a custom data-loader, that will return image-caption pairs using which you can train MaPLe further. You can refer to this great tutorial on coming up with a data-loader in pytorch that provides image-text pairs to train CLIP like model.

As MaPLe is based on Dassl Library, you will need to dig a bit inside there as well as this part of the code where you will need to implement your custom loader.

2- What changes can be made in the code to adapt the model on multi-class classification during training ?

In order to classify the given image into multiple classes, you can perform one of the following:

Kindly let me know if that helps to solve you issue.

Thank you and kind regards.

AhmedBourouis commented 1 year ago

Thank you for the clear and detailed answer! You fully answered my question so I'm closing this now.

vrk7 commented 1 year ago

@AhmedBourouis Could you please the training notebook if you have it handy?