Closed kaichi040696 closed 6 years ago
Thank you for posting your question here
First of all, I would suggest you to use T.nnet.relu
for the relu instead of your custom function. The abs
function probably won't work here. Theano has different function T.abs_
Second, It looks to me that gradient for functions like floor
, ceil
or round
should be zero which means that after backpropagation network will learn nothing
In [36]: T.grad(x.sum(), wrt=x).eval({x: np.array([[1,2], [3,4]]).astype(np.float32)})
Out[36]:
array([[1., 1.],
[1., 1.]], dtype=float32)
In [37]: T.grad(T.floor(x).sum(), wrt=x).eval({x: np.array([[1,2], [3,4]]).astype(np.float32)})
Out[37]:
array([[0., 0.],
[0., 0.]], dtype=float32)
That's how round function looks like. It's mostly constant and at some points it's undefined
Thank you very much for your reply.
Since as you say T.floor() is not possible to use. Is there any way I can do to set the input value of the activation function from floating point to fixed point??
Thank you very much.
I don't think that you would be able to do it. Are there any specific reasons why you want to do it?
I think that the main reason why many frameworks work on float instead integers is because operation on float numbers still return you result as a float number. Whether with integers, after dividing (I'm not even talking about probabilities and non linear functions like log or exp) integer x by integer y you are more likely to get float number which is a bit inconvenient since it's forces you to "jump" from one type to another. With one type it's easier to do optimizations in the backend. In fact, Theano has optimizations for float32 and float64 (but not float16) and you can see how much faster it works for float numbers.
Even if you convert float number to the integer operation would be the same as applying round
function which again means that your network will learn nothing.
I hope it will help
I am doing this because I am trying to develop a simulator for a novel neural network architecture which requests less hardware. To do this, I try to create a custom ReLu, Sigmoid and Softmax function which limiting the input of each activation function from floating point number to be a fixed point number. After that, check the result of the custom layer is it produce an acceptable result comparing with the original ReLU, Sigmoid and Softmax function.
If there is not possible to limit the input of activation function to a fixed-point number. May I ask that do u know which part of the code in Neupy toolbox do the work of calculating the sum of weighted input? Since for MLP, the input value of hidden layer is the sum of all the output of input layer neurons multiplies by wrights.
Thank you very much for your helping. I do try lots of ways to limit the floating-point value to fixed-point and all do not show a result as what I expected. And I spend lots of time to study the code of Neupy toolbox and I still cannot find out where is the part of the code which calculates the sum of weighted input.
I am doing this because I am trying to develop a simulator for a novel neural network architecture which requests less hardware.
I see. In neupy you still can use float16
which can give some reduction in precision (but it could be slower than 32-bit float since it's not optimized)
I think you can still apply some tricks in order to make it work. There are some papers that address this issues
First one I read recently. In the paper they suggest using two weights per each layer. One will have regular precision and the other one would be the same but quantized (low-precision). You compute output with quantized weights and calculate the gradient for it. With this gradient you update your main weights which you can quantize again and use for the next iteration. They try to make weights binary, but I think this trick should more for even higher precision.
I think you can check these papers for some inspiration. In case if you want to find more papers related to this subject then you can search for key-phrases like "low-precision neural networks" or "8-bit precision neural networks"
Completely different approach would be to use other algorithm that not based in the gradient information (maybe evolutionary algorithm).
In any cases, you will probably need to write more custom code since it's not a trivial problem to solve.
Thank you very much for the papers. May I ask what do you know where is the part of the code which calculates the sum of weighted input (the Propagation function) in Neupy toolbox?
May I ask what do you know where is the part of the code which calculates the sum of weighted input (the Propagation function) in Neupy toolbox?
You probably mean these classes:
FYI, I plan to move all the code from theano to tensoflow and I've already move most of the code. If you want to use theano than make sure that your neupy's version is 0.6.3
Thank you very much! This helps me a lot. I change the Neupy code in ActivationLayer and limited the input value between line 57 and 58. This can produce the good result.
However, if I limit the self.weight or the input value after line 59, both training and validation error become the same after each epochs training. If I limit the input value on line 62, it shows that the training error is inf and validation error is nan. Do you know why will this happen? My idea is to limit on line 62, which is including the input, weight and bias or limit them individually.
my changed code:
def output(self, input_value):
if self.size is not None:
########### self changed code ################
# working
# print(input_value)
number_of_bit_limit = 8
def float_limit(n, b):
d = 2.0 ** b
# a = (n*d) - (n*d)%1
a = T.floor(n * d)
return a / d
input_value = float_limit(input_value, number_of_bit_limit)
###########################
input_value = T.dot(input_value, self.weight)
########### self changed code ################
# # not working
# input_value = float_limit(input_value, number_of_bit_limit)
###########################
if self.bias is not None:
input_value += self.bias
########### self changed code ################
# # not working
# input_value = float_limit(input_value, number_of_bit_limit)
###########################
return self.activation_function(input_value)
@kaichi040696 I don't think there is any difference compare to the previous solution. Gradient will have zero for all values again because of the rounding operation. Solution that I've explained before might look like this (pseudo-code):
# Two weights per each later
weight = generate_random_weight()
# In theano or tensorflow this operation suppose to create new variable and
# assign new value, to make sure that these are two different unrelated variables
quantized_weight = float_limit(weight)
# Compute output from the network and training error
output = sigmoid(x * quantized_weight)
error = mean((output - y) ** 2)
quantized_weight_gradient = compute_gradient(error, with_respect_to=quantized_weight)
# Update original weight (not quantized one from which we get the gradient)
weight -= alpha * quantized_weight_gradient
# Check what would be quantized weight after the update
quantized_weight = float_limit(weight)
# And repeat process again....
I am trying to create a custom activation layer base on Neupy, however, once I apply my custom layer to the network, the training and validation error keep the same in each epoch. For my custom function, I want to make the input value from a floating point value to a fixed point for both ReLU and Softmax function (same as the code below). Therefore, I create a function call "float_limit", which helps me to change a floating point value to be a fixed point value. My first idea is to use an int() function within my float_limit function. However, it shows type error since int() cannot use for tensor variable. So I change the int() function to be T.floor(), which can do the same work as int(). But then the result of the network end up in a straight line. May I ask that how can I fix this problem?
Thank you very much
This is my code: