This repository contains the code for the experiments of the following paper-like document: doc/going_deeper.pdf
The document describes a meta-layer for infinite deep neural networks. It basically wraps a few other layers in a special way that allows the neural network to decide how many sub-layers in the meta-layer should be used. Each sub-layer has its own weights, so the network also decides how many weights should be used. The complete training process may be done with gradient descent-based methods.
Please read doc/going_deeper.pdf for more details.
The repository contains a small library that allows it to use the described meta-layer. The library is based on Keras. The library is very minimal, so not all network architectures may be created with it. A basic model (the model of the first experiment), may be created like this:
# Create the model
n_input_units = 8
n_internal_units = 24
model = TDModel()
model += Input((n_input_units,))
model += Dense(n_internal_units, activation='relu', trainable=False)
# The described meta-layer
model += GInftlyLayer(
# The name
'd0',
# f_i(x)
f_layer=[
lambda reg: Dense(n_internal_units),
lambda reg: GammaRegularizedBatchNorm(reg=reg, max_free_gamma=0.),
lambda reg: Dropout(0.1),
],
# h(x)
h_step=[
lambda reg: Activation('relu')
],
# Regularizers
w_regularizer=(c_l2, w_reg),
f_regularizer=(c_l2, f_reg)#1e-2)
)
model += Dense(1, activation='sigmoid', trainable=False)
# Build the model
model.init(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)
You may try to test the first experiment to get a better feeling for the interface.
All experiments are more detailed documented in doc/going_deeper.pdf.
The first experiment uses 8 binary inputs and calculates the XOR-result of them. The used netwok contains only trainable weights in a GInftyLayer
-layer. Tests are done with 0-8 active inputs for the XOR-calculation. Inactive inputs are not used for the XOR-calculation and just get random input values.
It can be assumed that a network with more active inputs for the XOR-computation is more complex and, therefore, requires more sub-layers in the GInftyLayer
-layer. Exactly this can be shown with the given experiment. The w
-value, which basically contains the amount of sub-layers is higher for more active inputs:
The second experiment is conducted on the MNIST-dataset. The used network architecture contains two convolutional GInftyLayer
-layers and one fully connected GInftyLayer
-layer. The test accuracy is up to 99.5 % and it can be seen that the second convolutional GInftyLayer
-layer is the deepest layer. The first convolutional and the fully connected GInftyLayer
-layer have a very low activation. The second convolutional layer has a depth of 2. The weights are visualized on the following plot: