Re-implementation in Keras

hienpham15 commented 6 years ago

Hi, since I'm trying to re-implement your code in Keras (python 3.6), I open this thread for some questions and advice.

You defined your input images as 512x512x3, but then your SqueezeNet take an input of 224x224x3, I'm confused, can you clarify this?

Since you used Adam optimizer, this part of the code which uses SGD as training optimizer is unnecessary, right?

opt1 = tf.train.AdamOptimizer(self.learning_rate * FINE_TUNE_LR_RATIO)
opt2 = tf.train.AdamOptimizer(self.learning_rate)
grads = tf.gradients(self.total_loss, var_list1 + var_list2)
grads1 = grads[:len(var_list1)]
grads2 = grads[len(var_list1):]
train_op1 = opt1.apply_gradients(zip(grads1, var_list1))
train_op2 = opt2.apply_gradients(zip(grads2, var_list2))
self.train_step_sgd = tf.group(train_op1, train_op2)

Did you train your SqueezeNet from scratch or use the wieghts from the pretrained SqueezeNet model ?
When you perform the data augmentation:

The main problem is white balancing, is it necessary to crop and rotate images?
From your code, you augmented the ground truth illuminations by doing this:
```
color_aug[i, i] = 1 + random.random( ) * AUGMENTATION_COLOR - 0.5 * AUGMENTATION_COLOR
```
This will change the original ground truth illumination wthout retaining the von Kries ratio, so my question is why? What's your intuition behind this?

Bonus question: Is it necessary to mask out the MCCs? I see no reason behind this also

yuanming-hu commented 6 years ago

Thanks for the good questions. Just some quick answers:

We use SqueezeNet as a fully convolutional network so there is no constraints on input resolution
Yes, we use Adam. We tried SGD but it doesn't give bettter result.
We use the pre-trained model.
We actually found these seemingly unrelated augmentations very helpful. One explanation is that our system largely benefits from semantic understanding and rotation helps here. When relighting the image using color_aug, we "relight" the ground truth illumination as well.

Bonus answer: Yes. I haven't tried not masking the MCCs, but if you keep them you will probably get a bunch of "color checker" detectors, which clearly doesn't generalize to cases where there are no MCCs.

Please let me know if you have more questions.

hienpham15 commented 6 years ago

I have finished implementing your model in Keras framework. Though, I made some adjustments, such as: using VGG16 instead of Squeezenet, dividing the images into patches and train on all those patches... After training for 20 epochs with about 2000 patches (from 200 images) and testing on 160 images, I have the following results:

average_angular_error ~ 1.8 median_angular_error ~ 1.81

It's surprising me that the median is higher than the one in your paper. Also, I noticed that your model (or at least your ideas on my Keras implementation) perform better with indoor scenes (when comparing with CNN from Bianco or Deep Specialized Net from Wu shi). Here is my implementation, would you mind take a look and give me some comments whether I did it right or not? Thank you in advance

yuanming-hu commented 6 years ago

Hello and thanks for the implementation! The adjustments sound reasonable to me, and the achieved average angular error is comparable with our implementation using AlexNet.

However, I'm also surprised that the median error is even higher than the mean error.

It's interesting that our approach performs better on indoor scenes. To be honest I didn't draw this conclusion when doing this project. Thanks for letting me know. One explanation is that indoor scenes contain more noise (textureless walls, light bulbs, varying illumination etc.) with which our approach deals better.

Your implementation looks good (though I'm not very experienced with Keras). Again, the suprising thing is the high median angular error. One thing we can do is to visualize the norm of estimations to see if the confidence values are reasonable.

hienpham15 commented 6 years ago

After reading you Supplementary Material (for the FC4 paper) and the function get_visualization() in your code, I am quite confused about the size of the confidence map as well as the size of the semi-dense feature map.

As I understand,

The 'fc2 layer' is also the semi-dense feature map, isn't it?
Also, if the above speculation is true, the output size of the 'fc2 layer' is (w/32)x(h/32)x(-1), which is relatively small even when input with big size images (2041x1359); hence, the feature map size is nowhere near the target_shape (512x512). Have I been missing something or the size is really that small?
And where did you get the value for color_thresh = [250, 500, 750, 1000] ?

yuanming-hu commented 6 years ago

Thanks for the questions.

Yes, fc2 should be the so-called semi-dense feature map
The size is indeed small. Dividing by 32 here may result in a feature map of size around 16x16. We don't really need a high-resolution map here, as long as the combined receptive field covers the whole input image.
The thresholds are arbitrary values for visualization. I forgot to remove these values during code release. They are actually used for an internal project.

yuanming-hu commented 6 years ago

Btw, 2041x1359 is too large for FC4. I think in my code I downsample it by a factor of two. This actually results in a larger (and semantically more useful) receptive field.

pdaniol commented 5 years ago

Hi! I've just started to learn Keras and I am really interested how this re-implementation looks like. @hienpham15 do you still have your source code? I would be really grateful if you could share it.

yuanming-hu / fc4

Re-implementation in Keras #18