This task is an example project, I want to use to later abstract and reuse in other projects, some aspects can not be replaced but a general structure can be created.
1. Implement mapping between output vector -> object
We want to read these file types into a structure like
data_set
input_count
output_count
training_set
inputs
outputs
classes
count
bias
validation_set
inputs
outputs
classes
count
bias
test_set
inputs
outputs
classes
count
bias
3. Implement activation function and it's derivative
Decide on
sigmoid (f(x) = 1/(1+e^(-x)))
hyperbolic tangent (f(x) = (tanh(x)+1)/2)
4. Implement feed format function
Accepts input weight, concatenated input and bias matrix and returns an output-net-matrix
net = mul(weights, horcat(inputs, bias))
output = activate(net)
The bias matrix is a constant column vector of 1s with as many rows as the input matrix. This vector corresponds to the bias nodes. Using this implementation here is a bit clumsy, but for now, the approach reduces the potential for error.
5.Implement a weight initialisation function
This function must take in a maximum weight, a width and height, and return a matrix of the given width and height, randomly initialised in the range [-max_weight max_weight].
6. Implement a function that evaluates the network error
The function must take in:
an input matrix,
a weight matrix,
a target output matrix,
a target class matrix,
a bias matrix.
The function must return the error e, and the classification error c.
a bias vector
The function must return an updated weight matrix. For now, return W as is.
8. Implement the train function
The training function should take in three sets, the training_set, validation_set, and test_set. Implement a way to limit the maximum number of samples that will actually be used for training (you can also do his in the main program described in the next section). This is very helpful for debugging purposes (especially if you plan to later replace the backpropagation algorithm with something a little faster – and more complicated).
The function should return a weight matrix, and error values as floats.
Initialise a value plot_graphs to true. This is a debug flag, so it is appropriate to implement this as a macro if it is supported by the implementation language.
The function should initialise a weight matrix using initialise weights. For now, use a max_weight of 1/2.
The function should also construct three bias vectors bias_training, bias_validate, and bias_test. Each must contain only 1s, with as many rows as there are inputs in the training, validation and test sets respectively.
Implement a while loop that stops after 500 iterations. (We will change the while condition later to something else, so do not use a for loop).
Inside the loop, call the backpropagation algorithm. Use the training set inputs, the weights, (for now) a fixed learning rate of 0.1, and bias vector bias_train. Assign the result to weights.
Still inside the loop, call the network error function three times: one time for each of the training, validation, and test sets. Use the weight matrix, and the appropriate bias vector. Wrap these calls in an if-statement that tests for a value plot_graphs. (If your language supports it, you can use conditional compilation on the value of plot_graphs).
Store the errors in six arrays (error_train, classification_error_train, etc.), with the current epoch number as index.
After the loop, plot the six error arrays as a function of epoch number. Wrap this in an if-statement (or conditional compilation statement) that tests for the value plot_graphs.
Call the network error function again, on all three sets as before.
Return the weights, and the six errors.
9. Implement the main training program
The program should load in the sets (using the load_sets function), and pass these to the training algorithm.
10. Run the program
The important thing is that everything should run. You should see your error plots; at this stage they should be straight, horizontal lines. Because of the random weight initialisation, we cannot predict where these lines will lie (so do not be alarmed if they do not look exactly the same as below – as long as they are straight and horizontal).
11. Implement the backpropagation function
You have already created the dummy function; now you can put in the actual calculations.
First, select a random sample.
Now, calculate the net matrix and output matrix using the feed-forward function.
First, set the debug option to train on only one sample.
13. Implement a proper stopping condition
Change the while loop to stop when the validation error drops below a threshold. Note that this threshold usually depends on the problem. There are better stopping conditions that are less sensitive to the problem at hand, but this one will do for now.
14. Implement a statistical analysis
This part is important for you to get an idea of the robustness of the neural net. In practice, a very simple analysis will suffice.
This part need to train the algorithm 30 times, and then report the mean, standard deviation and maximum of the
training time,
regression error, and
classification error (on the test sets).
In general, you would like all these values to be “low”. Here are some experiments for the iris data set with different learning rates. For each, 30 runs were made; other parameters are as described earlier (max_weight = 1/2, validation_stop_threshold = 0.1).
depends on #10 #11 #12
This task is an example project, I want to use to later abstract and reuse in other projects, some aspects can not be replaced but a general structure can be created.
1. Implement mapping between output vector -> object
for example: Apple:
1 0 0
-> 0 Orange:0 1 0
-> 1 Banana:0 0 1
-> 22. Implement File readers
We want to read these file types into a structure like
3. Implement activation function and it's derivative
Decide on
f(x) = 1/(1+e^(-x))
)f(x) = (tanh(x)+1)/2
)4. Implement feed format function
Accepts input weight, concatenated input and bias matrix and returns an output-net-matrix
The bias matrix is a constant column vector of 1s with as many rows as the input matrix. This vector corresponds to the bias nodes. Using this implementation here is a bit clumsy, but for now, the approach reduces the potential for error.
5.Implement a weight initialisation function
This function must take in a maximum weight, a width and height, and return a matrix of the given width and height, randomly initialised in the range [-max_weight max_weight].
6. Implement a function that evaluates the network error
The function must take in:
The function must return the error e, and the classification error c.
7. Implement a dummy backpropagation function
The function should take in:
8. Implement the train function
The training function should take in three sets, the training_set, validation_set, and test_set. Implement a way to limit the maximum number of samples that will actually be used for training (you can also do his in the main program described in the next section). This is very helpful for debugging purposes (especially if you plan to later replace the backpropagation algorithm with something a little faster – and more complicated).
The function should return a weight matrix, and error values as floats.
Initialise a value plot_graphs to true. This is a debug flag, so it is appropriate to implement this as a macro if it is supported by the implementation language.
The function should initialise a weight matrix using initialise weights. For now, use a max_weight of 1/2.
The function should also construct three bias vectors bias_training, bias_validate, and bias_test. Each must contain only 1s, with as many rows as there are inputs in the training, validation and test sets respectively.
Implement a while loop that stops after 500 iterations. (We will change the while condition later to something else, so do not use a for loop).
Inside the loop, call the backpropagation algorithm. Use the training set inputs, the weights, (for now) a fixed learning rate of 0.1, and bias vector bias_train. Assign the result to weights.
Still inside the loop, call the network error function three times: one time for each of the training, validation, and test sets. Use the weight matrix, and the appropriate bias vector. Wrap these calls in an if-statement that tests for a value plot_graphs. (If your language supports it, you can use conditional compilation on the value of plot_graphs).
Store the errors in six arrays (error_train, classification_error_train, etc.), with the current epoch number as index.
After the loop, plot the six error arrays as a function of epoch number. Wrap this in an if-statement (or conditional compilation statement) that tests for the value plot_graphs.
Call the network error function again, on all three sets as before.
Return the weights, and the six errors.
9. Implement the main training program
The program should load in the sets (using the load_sets function), and pass these to the training algorithm.
10. Run the program
The important thing is that everything should run. You should see your error plots; at this stage they should be straight, horizontal lines. Because of the random weight initialisation, we cannot predict where these lines will lie (so do not be alarmed if they do not look exactly the same as below – as long as they are straight and horizontal).
11. Implement the backpropagation function
You have already created the dummy function; now you can put in the actual calculations.
First, select a random sample.
Now, calculate the net matrix and output matrix using the feed-forward function.
and return the matrix.
12. Run the program (AGAIN)
First, set the debug option to train on only one sample.
13. Implement a proper stopping condition
Change the while loop to stop when the validation error drops below a threshold. Note that this threshold usually depends on the problem. There are better stopping conditions that are less sensitive to the problem at hand, but this one will do for now.
14. Implement a statistical analysis
This part is important for you to get an idea of the robustness of the neural net. In practice, a very simple analysis will suffice.
This part need to train the algorithm 30 times, and then report the mean, standard deviation and maximum of the
In general, you would like all these values to be “low”. Here are some experiments for the iris data set with different learning rates. For each, 30 runs were made; other parameters are as described earlier (max_weight = 1/2, validation_stop_threshold = 0.1).