sphinxteam / Boltzmann.jl

Restricted Boltzmann Machines in Julia
Other
15 stars 4 forks source link

Fully support Gaussian visible units #10

Open eric-tramel opened 8 years ago

eric-tramel commented 8 years ago

Currently, the package only has basic support for Gaussian visible units. Full support, including user-specified variables, should be included in package. Additionally, a full series of tests should be conducted to make sure that the training procedure for this style of RBM is indeed producing expected results.

eric-tramel commented 8 years ago

I've been tinkering with this a bit, and its taken some time to try to pin down the learning procedure, exactly. In the end, it boils down to just sampling the visible units from a Gaussian distribution during the Monte-Carlo step. However, it seems in the literature that there is some nuance in setting the variance of these distributions. I'll try to finally document these differences here if at all possible.

Some approaches say that one should take the original dataset and normalize everything so that the visible hidden units are always sampled from unit-variance Gaussians. Then, throughout the training, this variance is left fixed. Yet other approaches mention that one should instead start these as unit variance Gaussians, but then one should use the contrastive-divergence learning to also update the variance of these units. I believe this was promoted but the teams at Aalto. I need to go back and pin down the references on these techniques.

I had done some experiments which made this implementation of the Gaussian-visible-unit contrastive divergence learning, but I wasn't very convinced by the results. I'll have to go back and re-attempt it.

marylou-gabrie commented 8 years ago

Where you using Boltzmann.jl to conduct your experiments @eric-tramel ?

If I am not mistaken, in the current version of the package, the function vis_mean doesn't have a specific method for Gaussian units, while it should. In this case the mean is directly the effective field and not its sigmoid. Is that right ?

eric-tramel commented 8 years ago

@marylou-gabrie , I had made branch off of the master and was making the sampling function detect the distribution used for the visible units...whether a Normal or a Bernoulli. Based on that it was calling the proper MC sampling function for the layer.

But yes, of course one shouldn't be using the sigmoid directly on that layer, but only when sampling the hidden layer.