Open eric-tramel opened 8 years ago
I forgot to reference this issue in the commit d3c2caa9469e188161396c40af7b7d8d883a7b9d ! I have created a separate branch to work on implementing this feature.
For my first attempt at implementing this feature, I added an optional parameter to rbm.jl/fit()
to allow the user to specify the dropout rate:
function fit(rbm::RBM, X::Mat{Float64};
persistent=true, lr=0.1, n_iter=10, batch_size=100, n_gibbs=1,dorate=0.0)
From here, I tried to take the approach I quoted earlier (§8.2 of Srivasta et al 2014) and apply a different dropout pattern for each training sample in the mini batch. I accomplish this within rbm.jl/gibbs()
:
function gibbs(rbm::RBM, vis::Mat{Float64}; n_times=1,dorate=0.0)
suppressedUnits = rand(size(rbm.hbias,1),size(rbm.vis,2)) .< dorate
...
I then modify rbm.jl/sample_visibles()
(and, the corresponding rbm.jl/vis_means()
) to take this logical array specifying the suppressed/dropped hidden units and assign zeros to the dropped hidden units before calculating the matrix-matrix product between rbm.W
and the hidden activations:
function vis_means(rbm::RBM, hid::Mat{Float64}, suppressedUnits::Mat{Bool})
hid[suppressedUnits] = 0.0 # Suppress dropped hidden units
p = rbm.W' * hid .+ rbm.vbias
return logistic(p)
end
This should, in total, accomplish the dropout.
Now, what isn't clear to me is whether or not the dropout pattern should be changing from epoch to epoch. The paper seems to indicate that the pattern should be changing from mini batch to mini batch, but it doesn't specify anything about the epoch. I am assuming that this pattern is updated at every mini batch computation, however. If anyone has any references to other RBM Dropout implementations, they might be helpful in clearing up this issue.
Okay, it had some issues, some bugs I put in, but now it is building and passing tests! I'll need to make a dropout test to ensure that everything is really working correctly.
Okay! It works! The issues I was being with the keywords not being recognised were due to the workspace not being cleared before running the mnistexample_dropout.jl
script. After clearing out the workspace, it seemed to run fine. What is yet to be done is to run a comparison to show that this implementation of dropout is really giving some advantage over no dropout.
Thanks @alaa-saade !
So, it seems like there is still something to be desired in the Dropout performance. Currently there does not seem to be much difference between it and the pseudo-likelihood obtained when not using dropout, as shown in the following figure:
I'm going to restructure where the dropout is enforced. I think that perhaps I'm not doing it in the right manner. Referring to This Lua/torch7 implementation, it seems that we need to make sure to suppress these units on the gradient update, as well.
Interesting. But is it known that the effect of dropout can be seen on pseudo likely hood ??
@krzakala : I don't truly know if the effect can be seen on the PL or not. You could very well be right on this point. I'm working on a demo, now, which reports the estimated features (W), as well. I'll also include a histogram of the hidden activations, as was done in (Srivasta 2014), to show the discrepancy between the approaches.
Goal
One of the latest/best regularisation techniques for training RBMs is dropout. Unfortunately, the original Boltzmann.jl package does not implement this technique, so we should undertake this ourselves.
Technique
During the training phase of the RBM, each hidden node is present with only probably $p$. Training is performed for this reduced model and then the resulting trained models are combined. The pertinent section from (Srivasta 2014) reads,
Srivasta et al, "Dropout: A simple way to prevent neural networks from overfitting," JMLR, vol. 15, 2014, pp. 1929-1958.