Open killawhale2 opened 4 years ago
The purpose is that the "forward" value is going to be the binarized weight (binary_weights_no_grad) while the value for obtaining the gradient (the "backward" value) is the clamped weight (cliped_weights).
This trick is also used for the binary_activation for using a binarized value for the forward pass while using the approximation for the backward pass.
binary_weights_no_grad is a floating-point tensor, sign(w) * scale. After the training is done, how can be converted to a binary weight?. I tried, in a naive way, to just use sign(w) without positive results. In essence, after using sign(w) over the trained weights, the network did not work anymore.
First of all, thank you for sharing the PyTorch implementation, it's wonderful. I've been going over the code and found this line:
binary_weights = binary_weights_no_grad.detach() - cliped_weights.detach() + cliped_weights in the birealnet.py and was wondering what the purpose of this is for. My best guess is that it's to merely allow the gradients to exist without actually changing the values of the binary weights, but some helpful clarification would be wonderful!