mpatacchiola / deepgaze

Computer Vision library for human-computer interaction. It implements Head Pose and Gaze Direction Estimation Using Convolutional Neural Networks, Skin Detection through Backprojection, Motion Detection and Tracking, Saliency Map.
MIT License
1.79k stars 478 forks source link

Input format for the network #95

Closed santo4ul closed 4 years ago

santo4ul commented 4 years ago

Hi, Based on my understanding the network works with input.

In other words, the input given to the 1st Conv2D takes input in below format/processing

  1. BGR color format
  2. Mean subtraction done with -127
  3. Channels are in 8 bit signed (and not float32).

Could you please confirm if my understanding is correct?

Thank you.

mpatacchiola commented 4 years ago

Hi @santo4ul

  1. Correct, this is using the OpenCV convention.
  2. Yes, you have to subtract 127 to center the values, image must be of size 64x64 pixels.
  3. Not sure what you mean by this, the images once loaded in tensorflow become tf.float32 tensors.
santo4ul commented 4 years ago

Hi @mpatacchiola

On 3. above, Originally the R, G and B channels are 8 bit signed values. We do mean subtraction and just pass it as is to the network. So the channels are still 8bit signed values.

On the Tensorflow side, at the input layer (Reshape), the 8bit signed values we pass is used as tf.float32.

For example,

Original Input: Signed 8 bit

B = 128 G = 129 R = 130

After mean subtraction:

B = 1 G = 2 R = 3

At Tensorflow input layer: B = 1.0 G = 2.0 R = 3.0

What I mean is, there is no other pre-processing involved other than the mean subtraction of 127. Am I right?

mpatacchiola commented 4 years ago

Yes that is correct. You just have to normalize by subtracting 127. The other thing you should do is to resize the image if it is larger that 64x64 pixels. This is done using an inter area interpolation, in OpenCV you can do:

image_resized = cv2.resize(image, (64, 64), interpolation = cv2.INTER_AREA)

santo4ul commented 4 years ago

Thanks a lot @mpatacchiola for your prompt reply!

Yes, I'm resizing the input accordingly.