microsoft / BridgeTower

Open source code for AAAI 2023 Paper "BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning"
https://arxiv.org/abs/2206.08657
MIT License
158 stars 6 forks source link

processor support for single channel? #13

Closed swtb3 closed 8 months ago

swtb3 commented 8 months ago

Hello,

I have been trying to finetune this model using my own data which is single channel images and some text describing features of the images.

The processer only seems to handle 3 channel images. To get around this I have stacked the images onto themselves to fill 3 channels. However, this seems like it will have a significant impact on the amount of data I will train the model on.

Is there currently support for single channel in a pretrain BridgeTower checkpoint? If not are there plans to include it? Is stacking images the best approach when working with single channel images for this model?