Object Detection: More than 3 depth channels for input images

minhnhat93 commented 7 years ago

Hello, I am working on an object detection project and would like to use the RGB background images as addtional data for the object detector. Specially, I would like to stack backgrounds and images into a 4D tensor with 6 depth channels. However, when I looked at the file exporter.py, only 3 depth channels is allowed for image_tensor. Is there any plan to support more than 3 depth channels for input images in the near future? Or can you give me some hint to change the code so that it will be able to support this? Thanks.

protossw512 commented 7 years ago

I would like to upvote for this feature. I have similar situation. My training data includes RGB channel, near near infrared channel and thermal channel images (5 channels). Right now I am converting RGB to gray and combine them into 3 channel images. It works fine but if would be great if we can modify and retrain the first convolution layer.

minhnhat93 commented 7 years ago

@protossw512 I have found a wait to modify the code to allow more than 3 channels. I can write a tutorial on this if you need. Also, can send a pull request for this?

shamanez commented 7 years ago

If you want to input a different channel image you have to change the whole feature extractor . Then the problem is you can't do transfer learning .So how do you gonna train ? Train from the scratch ?

protossw512 commented 7 years ago

@minhnhat93 It would be greate if you could write a tutorial! @shamanez Actually, not really. You only need to do some modification for first convolution layer, adding random initialized weights for extra channels and keep all weights for the rest. It is true that the original cnn is trained with RGB images, but the weights are still quiet useful even if you want to train on other kinds of images. A good example would be using pretrained RGB image weights for gray images.

shamanez commented 7 years ago

So you will initialize it as we initialize the score layers or box predictors . But isn't that making every thing messy ?

michaelisard commented 7 years ago

This discussion is better held on StackOverflow since at this point it is not a bug or feature request. There is also a larger community that reads questions there. Thanks!

Supersak80 commented 7 years ago

@minhnhat93 have you written a tutorial on how to do this? I'm very interested.

minhnhat93 commented 7 years ago

@Supersak80 Sorry. I've been extremely busy in the last few days. Will try to write one by the end of this week and post it here once I finished.

Supersak80 commented 7 years ago

@minhnhat93 I look forward to it. Even if you can't get to writing the tutorial right away, can you give me a quick summary of what portion of the code you changed to accept multi-band imagery with more than 3 bands? I thought this was a limitation imposed by the required TF record format, which assumes color images.

irmowan commented 7 years ago

@minhnhat93 @protossw512 Sorry for disturb you, but I want to know how to use only 1 channel. I want to use gray data with only one channel for train and test, but now TensorFlow still use 3 channels as input.

protossw512 commented 7 years ago

@irmowan If you have images that is less than 3 channels, then it will be much easy, at least for me. You have 2 options, one is to pad 0's for the rest channels, the other is try to expand grayscale images to rgb images by copying it to the rest channels and let backprppogation to do the magic, with pretrained weights you could easily get to pretty good local minimum. Remember apply same operation to your eval dataset. I got good results from both ways. Well, there is no theory, just my personal heuristic. remember to use smaller learning rate and decay. If you are using your dataset and it is relatively small, try not to train too long to prevent overfitting.

protossw512 commented 7 years ago

@shamanez We use transfer learning because our dataset is too small to keep our network from overfitting, or stuck at a very bad local minimum, or speed up training process.. Transfer learning can give us a good start, in general, it will be easier for us to find a good local minimum. Since the initial information is already well extracted. Even if we mixed pretrained weights and initialized weights for the first layer, it will not mass up since the channels with pretrained weights could provide much useful information, that's just my personal understanding amyway. You can also see images with different distribution, like CT, MRI, can still yield good results from transfer learning.

irmowan commented 7 years ago

@protossw512 Both of your ways seems that TensorFlow Object Detection must use 3 channels, I have used the second way and it works (But I found that TF read my 1-channel image as 3-channels, it confused me). So it seems there is no way to use only 1 channel for training (mostly for speed), except that I pick one channel at the beginning of the base network.

minhnhat93 commented 7 years ago

@Supersak80 Hi everyone, I have written a tutorial at https://github.com/minhnhat93/tf_object_detection_multi_channels It includes a script to modify provided checkpoint if you want to do transfer learning. Send me a message if you think I made a mistake. Edit: I can't publish the edit checkpoint script. Sorry.

goldenberg commented 7 years ago

Thanks, @minhnhat93! This is very useful and well written up!

horsetmotiv commented 7 years ago

@minhnhat93 Thank you very much! Your program is very useful, I am now doing a gray image of the target detection system, reference to your program, I have a problem, do not know if you have encountered, thank you problem: problem

gsygsy96 commented 6 years ago

If I want concat images in 'depth' ,not concat images in 'channel'.What should I do?Could you help me? @minhnhat93

rkdasari commented 6 years ago

@protossw512 Have you figured a way to change the first convolution layer? I do not want to train a model from scratch. I would like to use a model trained on image net and just modify the first convolution layer to suit my input which has channels > 3.

Falmi commented 6 years ago

@goldenberg is it works for you mine is >3 channel, I followed the the tutorial that @minhnhat93 gave. But found bugs while training, pls help. FYI, I am using t. FYI, I am using SSD_inception_V2., CUDA 8,tensorflow-gpu 1.3.0.

InvalidArgumentError (see above for traceback): Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,410,516,10] [[Node: batch/cond/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_BOOL, DT_INT32, DT_BOOL, DT_INT32, DT_FLOAT, DT_INT32, DT_STRING, DT_INT32, DT_STRING, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/cond/padding_fifo_queue_enqueue/Switch:1, batch/cond/padding_fifo_queue_enqueue/Switch_1:1, batch/cond/padding_fifo_queue_enqueue/Switch_2:1, batch/cond/padding_fifo_queue_enqueue/Switch_3:1, batch/cond/padding_fifo_queue_enqueue/Switch_4:1, batch/cond/padding_fifo_queue_enqueue/Switch_5:1, batch/cond/padding_fifo_queue_enqueue/Switch_6:1, batch/cond/padding_fifo_queue_enqueue/Switch_7:1, batch/cond/padding_fifo_queue_enqueue/Switch_8:1, batch/cond/padding_fifo_queue_enqueue/Switch_9:1, batch/cond/padding_fifo_queue_enqueue/Switch_10:1, batch/cond/padding_fifo_queue_enqueue/Switch_11:1, batch/cond/padding_fifo_queue_enqueue/Switch_12:1, batch/cond/padding_fifo_queue_enqueue/Switch_13:1, batch/cond/padding_fifo_queue_enqueue/Switch_14:1, batch/cond/padding_fifo_queue_enqueue/Switch_15:1, batch/cond/padding_fifo_queue_enqueue/Switch_16:1, batch/cond/padding_fifo_queue_enqueue/Switch_17:1, batch/cond/padding_fifo_queue_enqueue/Switch_18:1, batch/cond/padding_fifo_queue_enqueue/Switch_19:1, batch/cond/padding_fifo_queue_enqueue/Switch_20:1, batch/cond/padding_fifo_queue_enqueue/Switch_21:1, batch/cond/padding_fifo_queue_enqueue/Switch_22:1)]]

kkk333 commented 6 years ago

Hello, What is I have to detect the face from the 3d kinect video in that case we will have 4 channels RGB and the depth, how can I do this..I am confused......or Depth is not needed..?

rusuvalentin commented 5 years ago

@horsetmotiv I have the same problem and I've been trying to solve it from almost 4 days. I am training ssd_restnet50 with fpn and pretrained coco . It looks like the requested shape is actually the width x height x 1 (only one channel) but I cannot determine from where the input value comes from. The interesting thing is that it changes every time I run the program and it highly depends by the way you save the encoded images in tf record. Did you manage to solve you problem?

tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 85826 values, but the requested shape has 307200 [[{{node Reshape_9}} = Reshape[T=DT_UINT8, Tshape=DT_INT32, _device="/device:CPU:0"](DecodeRaw, Cast)]] node IteratorGetNext}} = IteratorGetNext output_shapes=[[], [?], [?,4], [?], [?], [?], [?], [?], [?], [?,?,1], [], [], []], output_types=[DT_STRING, DT_FLOAT, DT_FLOAT, DT_INT64, DT_INT64, DT_INT64, DT_INT64, DT_BOOL, DT_FLOAT, DT_UINT8, DT_STRING, DT_INT32, DT_STRING], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorV2)]]

marianogaragiola commented 5 years ago

Hi, have anyone solved the issue with the shapes? I followed the tutorial to train with more than 3 channels but I got

Input to reshape is a tensor with 486696 values, but the requested shape has 3686400 [[{{node Reshape_9}} = Reshape[T=DT_UINT8, Tshape=DT_INT32, _device="/device:CPU:0"](DecodeRaw, Cast)]] [[{{node IteratorGetNext}} = IteratorGetNext[output_shapes=[[16], [16,640,640,3], [16,2], [16,3], [16,100], [16,100,4], [16,100,12], [16,100,12], [16,100], [16,100], [16,100], [16]], output_types=[DT_INT32, DT_FLOAT, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_BOOL, DT_FLOAT, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorV2)]]

tensorflow / models

Object Detection: More than 3 depth channels for input images #2278