Custom activation function in Neupy (floating point to fixed point)

kaichi040696 commented 6 years ago

I am trying to create a custom activation layer base on Neupy, however, once I apply my custom layer to the network, the training and validation error keep the same in each epoch. For my custom function, I want to make the input value from a floating point value to a fixed point for both ReLU and Softmax function (same as the code below). Therefore, I create a function call "float_limit", which helps me to change a floating point value to be a fixed point value. My first idea is to use an int() function within my float_limit function. However, it shows type error since int() cannot use for tensor variable. So I change the int() function to be T.floor(), which can do the same work as int(). But then the result of the network end up in a straight line. May I ask that how can I fix this problem?

Thank you very much

This is my code:

from sklearn import datasets, model_selection
from sklearn.preprocessing import OneHotEncoder
from neupy import environment,algorithms, layers
import numpy as np
from sklearn.model_selection import train_test_split
import theano
import theano.tensor as T

# load data
mnist = datasets.fetch_mldata('MNIST original')
data, target = mnist.data, mnist.target

# make one hot
data = data / 255.
data = data - data.mean(axis=0)

target_scaler = OneHotEncoder()
target = target_scaler.fit_transform(target.reshape((-1, 1)))
target = target.todense()

# print(target)

# split data for training and testing
environment.reproducible()

x_train, x_test, y_train, y_test = train_test_split(
    data.astype(np.float32),
    target.astype(np.float32),
    train_size=(6. / 7)
)

# Theano is a main backend for the Gradient Descent based algorithms in NeuPy.
theano.config.floatX = 'float32'

################# float limit #####################
def float_limit(n, b):
    d = 2 ** b
    return T.floor(n * d) / d

# def float_limit(n, b):
#     d = 2 ** b
#     return T.int(n * d) / d
############### custom function #################

############### relu ##################
def relu(x, alpha=0):
    # x = float_limit(x, 8)
    result = 0.5 * (x + abs(x))
    return result

class custom_relu(layers.ActivationLayer):
    def activation_function(self, input_value):
        # result = float_limit(relu(input_value), 8)
        # return result
        return relu(input_value)
#################### softmax ########################
# def softmax(z):
#     z -= np.max(z)
#     sm = (np.exp(z).T / np.sum(np.exp(z),axis=1)).T
#     return sm

class custom_softmax(layers.ActivationLayer):
    def activation_function(self, input_value):
        input_value = float_limit(input_value,8)
        input_value -= T.max(input_value)
        result = (T.exp(input_value).T / T.sum(T.exp(input_value), axis=1)).T
        limit_result = result
        return result

class custom_softmax_1(layers.ActivationLayer):
    def activation_function(self, input_value):
        input_value = float_limit(input_value,8)
        return T.nnet.softmax(input_value)
########### start the model architecture ############

network = algorithms.Momentum(
    [
        layers.Input(784),
        custom_relu(500), #Relu
        custom_relu(300),
        # layers.Relu(300), #Relu
        custom_softmax_1(10)  # this is custom_softmax_1 which apply float_limit to original softmax
    ],
    error='categorical_crossentropy',
    step=0.01,
    verbose=True,
    shuffle_data=True,
    momentum=0.99,
    nesterov=True,
)
# print the architecture(Input shape, Layer Type, Output shape)
network.architecture()

# train the network
network.train(x_train, y_train, x_test, y_test, epochs=20)
# network.train(x_train, y_train, epochs=20)

# show the accuracy
from sklearn import metrics

y_predicted = network.predict(x_test).argmax(axis=1)
y_test = np.asarray(y_test.argmax(axis=1)).reshape(len(y_test))
print("y_predicted",y_predicted)
print("y_test",y_test)

print(metrics.classification_report(y_test, y_predicted))

score = metrics.accuracy_score(y_test, y_predicted)
print("Validation accuracy: {:.2%}".format(score))

# plot the image
from neupy import plots
plots.error_plot(network)

itdxer commented 6 years ago

Thank you for posting your question here

First of all, I would suggest you to use T.nnet.relu for the relu instead of your custom function. The abs function probably won't work here. Theano has different function T.abs_

Second, It looks to me that gradient for functions like floor, ceil or round should be zero which means that after backpropagation network will learn nothing

In [36]: T.grad(x.sum(), wrt=x).eval({x: np.array([[1,2], [3,4]]).astype(np.float32)})
Out[36]:
array([[1., 1.],
       [1., 1.]], dtype=float32)

In [37]: T.grad(T.floor(x).sum(), wrt=x).eval({x: np.array([[1,2], [3,4]]).astype(np.float32)})
Out[37]:
array([[0., 0.],
       [0., 0.]], dtype=float32)

itdxer commented 6 years ago

That's how round function looks like. It's mostly constant and at some points it's undefined

kaichi040696 commented 6 years ago

Thank you very much for your reply.

Since as you say T.floor() is not possible to use. Is there any way I can do to set the input value of the activation function from floating point to fixed point??

Thank you very much.

itdxer commented 6 years ago

I don't think that you would be able to do it. Are there any specific reasons why you want to do it?

I think that the main reason why many frameworks work on float instead integers is because operation on float numbers still return you result as a float number. Whether with integers, after dividing (I'm not even talking about probabilities and non linear functions like log or exp) integer x by integer y you are more likely to get float number which is a bit inconvenient since it's forces you to "jump" from one type to another. With one type it's easier to do optimizations in the backend. In fact, Theano has optimizations for float32 and float64 (but not float16) and you can see how much faster it works for float numbers.

Even if you convert float number to the integer operation would be the same as applying round function which again means that your network will learn nothing.

I hope it will help

kaichi040696 commented 6 years ago

I am doing this because I am trying to develop a simulator for a novel neural network architecture which requests less hardware. To do this, I try to create a custom ReLu, Sigmoid and Softmax function which limiting the input of each activation function from floating point number to be a fixed point number. After that, check the result of the custom layer is it produce an acceptable result comparing with the original ReLU, Sigmoid and Softmax function.

If there is not possible to limit the input of activation function to a fixed-point number. May I ask that do u know which part of the code in Neupy toolbox do the work of calculating the sum of weighted input? Since for MLP, the input value of hidden layer is the sum of all the output of input layer neurons multiplies by wrights.

Thank you very much for your helping. I do try lots of ways to limit the floating-point value to fixed-point and all do not show a result as what I expected. And I spend lots of time to study the code of Neupy toolbox and I still cannot find out where is the part of the code which calculates the sum of weighted input.

itdxer commented 6 years ago

I am doing this because I am trying to develop a simulator for a novel neural network architecture which requests less hardware.

I see. In neupy you still can use float16 which can give some reduction in precision (but it could be slower than 32-bit float since it's not optimized)

I think you can still apply some tricks in order to make it work. There are some papers that address this issues

First one I read recently. In the paper they suggest using two weights per each layer. One will have regular precision and the other one would be the same but quantized (low-precision). You compute output with quantized weights and calculate the gradient for it. With this gradient you update your main weights which you can quantize again and use for the next iteration. They try to make weights binary, but I think this trick should more for even higher precision.

I think you can check these papers for some inspiration. In case if you want to find more papers related to this subject then you can search for key-phrases like "low-precision neural networks" or "8-bit precision neural networks"

Completely different approach would be to use other algorithm that not based in the gradient information (maybe evolutionary algorithm).

In any cases, you will probably need to write more custom code since it's not a trivial problem to solve.

kaichi040696 commented 6 years ago

Thank you very much for the papers. May I ask what do you know where is the part of the code which calculates the sum of weighted input (the Propagation function) in Neupy toolbox?

itdxer commented 6 years ago

May I ask what do you know where is the part of the code which calculates the sum of weighted input (the Propagation function) in Neupy toolbox?

You probably mean these classes:

ActivationLayer: https://github.com/itdxer/neupy/blob/master/neupy/layers/activations.py#L15
ParamterBasedLayer (parent for activation layer): https://github.com/itdxer/neupy/blob/master/neupy/layers/base.py#L179

itdxer commented 6 years ago

FYI, I plan to move all the code from theano to tensoflow and I've already move most of the code. If you want to use theano than make sure that your neupy's version is 0.6.3

kaichi040696 commented 6 years ago

Thank you very much! This helps me a lot. I change the Neupy code in ActivationLayer and limited the input value between line 57 and 58. This can produce the good result.

However, if I limit the self.weight or the input value after line 59, both training and validation error become the same after each epochs training. If I limit the input value on line 62, it shows that the training error is inf and validation error is nan. Do you know why will this happen? My idea is to limit on line 62, which is including the input, weight and bias or limit them individually.

https://github.com/itdxer/neupy/blob/f7303d985098840fc325d1d1406e96d7444aca25/neupy/layers/activations.py#L56

my changed code:

    def output(self, input_value):
        if self.size is not None:
            ########### self changed code ################
            # working
            # print(input_value)
            number_of_bit_limit = 8
            def float_limit(n, b):
                d = 2.0 ** b
                # a = (n*d) - (n*d)%1
                a = T.floor(n * d)
                return a / d
            input_value = float_limit(input_value, number_of_bit_limit)
            ###########################
            input_value = T.dot(input_value, self.weight)

            ########### self changed code ################
            # # not working
            # input_value = float_limit(input_value, number_of_bit_limit)
            ###########################

            if self.bias is not None:
                input_value += self.bias

            ########### self changed code ################
            # # not working
            # input_value = float_limit(input_value, number_of_bit_limit)
            ###########################

    return self.activation_function(input_value)

itdxer commented 6 years ago

@kaichi040696 I don't think there is any difference compare to the previous solution. Gradient will have zero for all values again because of the rounding operation. Solution that I've explained before might look like this (pseudo-code):

# Two weights per each later
weight = generate_random_weight()
# In theano or tensorflow this operation suppose to create new variable and
# assign new value, to make sure that these are two different unrelated variables
quantized_weight = float_limit(weight)

# Compute output from the network and training error
output = sigmoid(x * quantized_weight)
error = mean((output - y) ** 2)

quantized_weight_gradient = compute_gradient(error, with_respect_to=quantized_weight)
# Update original weight (not quantized one from which we get the gradient)
weight -= alpha * quantized_weight_gradient

# Check what would be quantized weight after the update
quantized_weight = float_limit(weight)

# And repeat process again....

itdxer / neupy

Custom activation function in Neupy (floating point to fixed point) #209