yaringal / ConcreteDropout

Code for Concrete Dropout as presented in https://arxiv.org/abs/1705.07832
MIT License
245 stars 68 forks source link

Contradiction between eqn. (3) and dropout_regularizer? Weight matrix shape vs input shape. #7

Closed iafydsttta closed 6 years ago

iafydsttta commented 6 years ago

When reading the paper, eqn. 3 states that the dropout probability should be regularized by multiplying the entropy of p by the dimensionality K of the weight matrix. However, in the code we multiply the entropy term by input_dim = np.prod(input_shape[1:]), which seems to just return the overall dimension of the input (and the code in the arxiv pdf used input_shape[-1]). For convolution layers for example, shouldn't we set K = input_volume_depth filter_xdim filter_ydim * output_volume_depth? I would appreciate it @yaringal if you could help me to clear up my confusion.

yaringal commented 6 years ago

The entropy is summed over for each output dim. For fully connected layers that's K, for convolutions that's filter_xdim filter_ydim output_volume_depth.

The first dim in Keras for input_shape is the input dimensionality. For fully connected layers input_shape[-1] is therefore the output dim K. np.prod(input_shape[1:]) is K for fully connected, and filter_xdim filter_ydim output_volume_depth for convolutions.

iafydsttta commented 6 years ago

Ok, thanks for the clarification. So the K in eqn. 3 is essentially K_input (K_l in the paper). Then the code indeed makes sense.