itayhubara / BinaryNet.tf

BNN implementation in tensorflow
165 stars 54 forks source link

Use of hyperbolic tangent for activation functoin #11

Open younghwanoh opened 5 years ago

younghwanoh commented 5 years ago

Hi,

Thanks you for open-sourcing this great idea ! I'm exploring your codes and doing some experiments with variants. The first thing is activation function. As written in BNN_cifar10.py, you used HardTanh function for activation function. I don't see any description on your paper about this, though I'm not 100% confident, but I found this significantly affects to accuracy anyway. Keeping ReLU with BNN as the full-precision counterpart does drops about 10% of top1 accuracy.

result

Do you have any insight about this? Because, when stacking very deep networks, I heard that hyperbolic tangent for activation could be a bad idea. I'm bit concerned about gradient vanishing problems, etc. If you can share some experience about this, why did you use the specific hyperbolic tangent function and so on, I'd be very nice.

Thanks in advance OYH

itayhubara commented 5 years ago

HardTanh is simply cliping the values to be between -1 and 1. Everything above 1 it sets to 1 and below -1 to -1, this helps the initial training phase. Since I used BN to normalize the input I know that most input's data would be in that area. After that I used the sign function which actually binarized the input. If you use the ReLU function you simply assign everything above 0 to 1 and the rest would be zero. Probably if you would clamp the relu values above 1 (same idea as relu6 only with 1 instead of 6) and use round function instead of sign you would get good results.
All the best, Itay