Question about implementation (not bug)

Hi,

Can you help to explain more about the code line 111-112 and 253? As I understand, these codes seem to work as the Straight-Through Estimator (STE)? However, I cannot understand how the output y is assigned to a new value but it is only affected in the backward pass? In the forward pass, y should still be quantized, right? Additionally, why do you use _tf.stop_gradient(-xclip) instead of _tf.stop_gradient(xclip) (the minus sign)? What is the difference here?

Thank you very much. Cheers,

microsoft / LQ-Nets

Question about implementation (not bug) #9