yaringal / ConcreteDropout

Code for Concrete Dropout as presented in https://arxiv.org/abs/1705.07832
MIT License
246 stars 68 forks source link

p higher than 0.5 after training... #4

Open andrisecker opened 6 years ago

andrisecker commented 6 years ago

Quick and hopefully not stupid question: I'm trying to use the ConcreteDropout class to train a convnet (classifying images into 12 classes). The first strange thing I observed was that usually the first 3 convolutional layers have higher dropout probabilities than the dense layers afterwards (independent from N), but the one which actually makes me worry is that sometimes the probabilities are higher than 0.5... see sample output below:

print np.array([K.eval(layer.p) for layer in model.layers if hasattr(layer, "p")])
[0.59234613 0.4666404  0.2114246  0.10445894 0.10087071]

Full model structure:

N = len(train_images)
l = 1e-5  # lenghts scale parameter (tau - model precision parameter is 1 for classification)
wd = l**2. / N  # this will be the l2 weight regularizer
dd = 1. / N  # this will regularize dropout (depends only on dataset size)

K.clear_session()
model = Sequential()
model.add(ConcreteDropout(Convolution2D(24, (11, 11), strides=(4, 4),
                                        padding="same", activation="relu",
                                        kernel_initializer="he_uniform", bias_initializer="zeros",
                                        data_format="channels_last"),
                          weight_regularizer=wd, dropout_regularizer=dd,
                          input_shape=(256, 256, 1)))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding="valid", data_format="channels_last"))

model.add(ConcreteDropout(Convolution2D(96, (5, 5),
                                        padding="same", activation="relu",
                                        kernel_initializer="he_uniform", bias_initializer="zeros",
                                        data_format="channels_last"),
                          weight_regularizer=wd, dropout_regularizer=dd))
model.add(MaxPooling2D(pool_size=(3, 3), padding="valid", data_format="channels_last"))

model.add(ConcreteDropout(Convolution2D(96, (3, 3),
                                        padding="same", activation="relu",
                                        kernel_initializer="he_uniform", bias_initializer="zeros",
                                        data_format="channels_last"),
                          weight_regularizer=wd, dropout_regularizer=dd))
model.add(MaxPooling2D(pool_size=(3, 3), padding="valid", data_format="channels_last"))

model.add(Flatten())
model.add(ConcreteDropout(Dense(512, activation="relu",
                                kernel_initializer="he_uniform", bias_initializer="zeros"),
                          weight_regularizer=wd, dropout_regularizer=dd))

model.add(ConcreteDropout(Dense(512, activation="relu",
                                kernel_initializer="he_uniform", bias_initializer="zeros"),
                          weight_regularizer=wd, dropout_regularizer=dd))

model.add(Dense(12, activation="softmax",
                kernel_initializer="he_uniform", bias_initializer="zeros"))

opt = optimizers.SGD(lr=0.005, momentum=0.9, nesterov=True)

model.compile(loss="categorical_crossentropy",
              optimizer=opt,
              metrics=["categorical_accuracy"])

history = History()

@yaringal @joeyearsley