depends on #10 #11 #12

This task is an example project, I want to use to later abstract and reuse in other projects, some aspects can not be replaced but a general structure can be created.

1. Implement mapping between output vector -> object

for example: Apple: 1 0 0 -> 0 Orange: 0 1 0 -> 1 Banana: 0 0 1 -> 2

2. Implement File readers

Training Data
Validation Data
Test Data

We want to read these file types into a structure like

data_set
- input_count
- output_count
- training_set
  - inputs
  - outputs
  - classes
  - count
  - bias
- validation_set
  - inputs
  - outputs
  - classes
  - count
  - bias
- test_set
  - inputs
  - outputs
  - classes
- count
- bias

3. Implement activation function and it's derivative

Decide on

sigmoid (f(x) = 1/(1+e^(-x)))
hyperbolic tangent (f(x) = (tanh(x)+1)/2)

4. Implement feed format function

Accepts input weight, concatenated input and bias matrix and returns an output-net-matrix

net = mul(weights, horcat(inputs, bias))
output = activate(net)

The bias matrix is a constant column vector of 1s with as many rows as the input matrix. This vector corresponds to the bias nodes. Using this implementation here is a bit clumsy, but for now, the approach reduces the potential for error.

5.Implement a weight initialisation function

This function must take in a maximum weight, a width and height, and return a matrix of the given width and height, randomly initialised in the range [-max_weight max_weight].

6. Implement a function that evaluates the network error

The function must take in:

an input matrix,
a weight matrix,
a target output matrix,
a target class matrix,
a bias matrix.

The function must return the error e, and the classification error c.

[output net] = feedforward(inputs, weights, bias)
error = sum_all_components((target_outputs – outputs)^2)/ (sample_count * output_count)
classes = classes_from_output_vectors(outputs)
c = sum_all_components(classes != target_classes)/sample_count

7. Implement a dummy backpropagation function

The function should take in:

An input matrix
A weight matrix
a learning rate (eta, as in the Greek letter)
a bias vector The function must return an updated weight matrix. For now, return W as is.

8. Implement the train function

The training function should take in three sets, the training_set, validation_set, and test_set. Implement a way to limit the maximum number of samples that will actually be used for training (you can also do his in the main program described in the next section). This is very helpful for debugging purposes (especially if you plan to later replace the backpropagation algorithm with something a little faster – and more complicated).

The function should return a weight matrix, and error values as floats.

Initialise a value plot_graphs to true. This is a debug flag, so it is appropriate to implement this as a macro if it is supported by the implementation language.

The function should initialise a weight matrix using initialise weights. For now, use a max_weight of 1/2.

The function should also construct three bias vectors bias_training, bias_validate, and bias_test. Each must contain only 1s, with as many rows as there are inputs in the training, validation and test sets respectively.

Implement a while loop that stops after 500 iterations. (We will change the while condition later to something else, so do not use a for loop).

Inside the loop, call the backpropagation algorithm. Use the training set inputs, the weights, (for now) a fixed learning rate of 0.1, and bias vector bias_train. Assign the result to weights.

Still inside the loop, call the network error function three times: one time for each of the training, validation, and test sets. Use the weight matrix, and the appropriate bias vector. Wrap these calls in an if-statement that tests for a value plot_graphs. (If your language supports it, you can use conditional compilation on the value of plot_graphs).

Store the errors in six arrays (error_train, classification_error_train, etc.), with the current epoch number as index.

After the loop, plot the six error arrays as a function of epoch number. Wrap this in an if-statement (or conditional compilation statement) that tests for the value plot_graphs.

Call the network error function again, on all three sets as before.

Return the weights, and the six errors.

9. Implement the main training program

The program should load in the sets (using the load_sets function), and pass these to the training algorithm.

10. Run the program

The important thing is that everything should run. You should see your error plots; at this stage they should be straight, horizontal lines. Because of the random weight initialisation, we cannot predict where these lines will lie (so do not be alarmed if they do not look exactly the same as below – as long as they are straight and horizontal).

11. Implement the backpropagation function

You have already created the dummy function; now you can put in the actual calculations.

First, select a random sample.

Now, calculate the net matrix and output matrix using the feed-forward function.

[output, net] = feedforward(random_sample, weights, bias)
error_vector = target_outputs - outputs
delta = hammard(error_vector, activation_diff(net))
weights_delta = scalar_mul(eta, kronecker(transpose(outputs), delta))
weights = add(weights, weights_delta)

and return the matrix.

12. Run the program (AGAIN)

First, set the debug option to train on only one sample.

13. Implement a proper stopping condition

Change the while loop to stop when the validation error drops below a threshold. Note that this threshold usually depends on the problem. There are better stopping conditions that are less sensitive to the problem at hand, but this one will do for now.

14. Implement a statistical analysis

This part is important for you to get an idea of the robustness of the neural net. In practice, a very simple analysis will suffice.

This part need to train the algorithm 30 times, and then report the mean, standard deviation and maximum of the

training time,
regression error, and
classification error (on the test sets).

In general, you would like all these values to be “low”. Here are some experiments for the iris data set with different learning rates. For each, 30 runs were made; other parameters are as described earlier (max_weight = 1/2, validation_stop_threshold = 0.1).

philsupertramp / game-math

Implement simple NN #13