I have been trying to finetune this model using my own data which is single channel images and some text describing features of the images.
The processer only seems to handle 3 channel images. To get around this I have stacked the images onto themselves to fill 3 channels. However, this seems like it will have a significant impact on the amount of data I will train the model on.
Is there currently support for single channel in a pretrain BridgeTower checkpoint? If not are there plans to include it? Is stacking images the best approach when working with single channel images for this model?
Hello,
I have been trying to finetune this model using my own data which is single channel images and some text describing features of the images.
The processer only seems to handle 3 channel images. To get around this I have stacked the images onto themselves to fill 3 channels. However, this seems like it will have a significant impact on the amount of data I will train the model on.
Is there currently support for single channel in a pretrain BridgeTower checkpoint? If not are there plans to include it? Is stacking images the best approach when working with single channel images for this model?