yjh0410 / CenterNet-plus

A Simple Baseline for Object Detection
56 stars 11 forks source link

Using Grayscale (1 channel) images? #6

Open YashRunwal opened 3 years ago

YashRunwal commented 3 years ago

Hello,

great work! Thanks for sharing this with the community.

I would also like to make use of ResNet-18 architecture as the backbone and then use CenterNet architecture. However, I have a dataset of Grayscale images with shape [512, 1536]. So my question is:

  1. Can I use grayscale images for training?
  2. Apart from the first layer in_channels of the backbone, what else do I need to change?

Thank You.

yjh0410 commented 3 years ago

Yes, of course you can. You can use opencv to convert your Grayscale images into RGB-style images(with 3 channels).

developer0hye commented 3 years ago

@YashRunwal @yjh0410 I recommend you guys to read this issue for processing gray image.

YashRunwal commented 3 years ago

Wow, that was a quick reply @yjh0410 and @developer0hye :) Umm, No, I don't want to convert my Grayscale image to RGB. I want to use the 1 channel images for training.

For the pre-trained backbone I can sum the weights of the first layer in the first dimension, thus not losing any information and then I think I can just modify the 1st layer in_channelsto 1, like below:

in_channels = 1
model.backbone.body.conv1.in_channels = in_channels
model.backbone.body.conv1.weight.data = model.backbone.body.conv1.weight.data.sum(dim=1, keepdim=True)

Note: This is just an example and the model used here is the pre-trained Faster RCNN from torchvision.

My question is, does this make sense for the centernet? :)

yjh0410 commented 3 years ago

Sorry ~

I am not sure whether it will work. I have never tried the method you introduced.

YashRunwal commented 3 years ago

@yjh0410 No problem. I will try it out and post the results here. So please don't close this thread, might be helpful for someone else.

Also,

  1. I have changed the backbone resnet18 which suits my need. I have concatenated 2 images at a certain layer in the modified backbone and then want to use the dilated convs and write a decoder and then do the predictions,

I will post the results here but would need your help from time to time, if possible.