Open tessselllation opened 5 years ago
Thanks :) You're correct about the passthroughSign function, that got me too. Remember that only the weight * activation multiplies are binary, summation is done in higher precision.
Regarding HardTanh - this is the same as clipped passthrough. You can see it in here: https://github.com/yaysummeriscoming/BinaryNet_and_XNORNet/blob/master/CustomOps/tensorflowOps.py
Specifically functions passthroughSignTF & clipped_passthrough_grad. I was still getting familiar with TF & keras at that stage, so the stop_gradient() trick would be prettier.
Thanks for your response @yaysummeriscoming! Silly me I totally missed how you brought clipped_passthrough_grad into def passthroughTanhTF(x). This method seems great to me, but I am very new to Keras and TensorFlow.
I've implemented the bipolar regulariser from How to Train a Compact Binary Neural Network with High Accuracy? https://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/download/14619/14454 I notice you discussed this under Issue#1. It was actually pretty easy and the effect was quite noticeable.
Thanks again for the amazing code!
Thanks for this amazing code @yaysummeriscoming !! So your passthroughSign function results in
y = 1 for x>0, 0 for x=0, -1 for x<0
But the BinaryNet paper specifies the binarization function to be
y = 1 for x>=0, -1 for x<0
Yet I notice all weights/activations are still only -1 or 1. Is this because the probability of obtaining an exact zero during training is negligible? Or am I missing a step somewhere?
Also, where can we see that the back propagation through the binarization layers is the HardTanh function?