maddin79 / darch

Create deep architectures in the R programming language
GNU General Public License v3.0
71 stars 31 forks source link

Model real value features in DARCH #7

Closed rz1988 closed 8 years ago

rz1988 commented 8 years ago

Ruslan's MatLab code which the darch package is largely based on deals with primarily binary inputs. (MNIST Data). This is not convenient for problems in other areas where we face continuous real value input variables. What functionalities exist in darch package that help model the real value input?

On my end, I have tried to scale the input vector to the range of 0 to 1 as Hinton suggested. I also transform the scaled the input using its empirical CDF so the resulted values are not adversely affected by outliers. However, this has not yielded good results.

Other options I have found include model the first layer RBM as a continuous visible layer RBM. The visible layer can be either Gaussian distribution or truncated exponential distribution. But I don't think these options are available in the current darch package.

The question is what I can do with the current darch package to model continuous input vector.

saviola777 commented 8 years ago

Could you provide a concrete example, including your darch parameters (and the version you are using) and the dataset? Generally, darch should be able to model real value input (after all, the MNIST dataset consists of real values between 0 and 1, as well), although it does work better if the values are scaled if you are using the sigmoid or tanh activation function. I find it more likely that the reason for your problems are other parameters (like learning rate).

rz1988 commented 8 years ago

Thank you Saviola. I use darch 0.10.0.

Due to confidentiality issues I am not able to share my datasets. However, I copied my settings here.

pretrainedDarch <- darch(inputRandomized, target, normalizeWeights = T, # to avoid numerical problems with linear activation function scale = F, # better convergence in most cases layers = c(32, 10, 100, 100, 100, 1), # you can play around with this, rbm.numEpochs = 500, rbm.batchSize = 800, rbm.trainOutputLayer = F, rbm.learnRateWeights = .005, rbm.learnRateBiasVisible = .1, rbm.learnRateBiasHidden = .1, rbm.weightCost = .0002, rbm.initialMomentum = .5, rbm.finalMomentum = .9, rbm.momentumSwitch = 100, rbm.visibleUnitFunction = sigmUnitFunc, rbm.hiddenUnitFunction = sigmUnitFuncSwitch, rbm.updateFunction = rbmUpdate, rbm.errorFunction = mseError, rbm.genWeightFunction = generateWeights, darch.batchSize = 2, # higher is faster, lower converges faster (more weight updates) darch.learnRateWeights = .001, # needs to be very low for linear activation function darch.learnRateBiases = .001, darch.isClass = T, # this has to be false for regression tasks darch.isBin = T, # as does this darch.bootstrap = F, # set to true if you want some measure of how well the network performs on unseen examples darch.numEpochs = 0, darch.momentumSwitch = 25, darch.layerFunctionDefault = sigmoidUnitDerivative

darch.layerFunctions = list("", "", "", "", "5"=linearUnitDerivative)

)

What parameter do you recommend for the sigmoid transform function?

saviola777 commented 8 years ago

You have rbm.numEpochs = 500 and darch.numEpochs = 0. Are you sure this is what you want? This executes layer-wise, unsupervised pre-training for 500 epochs per layer, and no fine-tuning at all (also, the last layer is not trained at all). Try switching the numbers, and start with only fine-tuning, i.e. rbm.numEpochs = 0 and darch.numEpochs = 500. If that does not immediately provide better results, increase the darch learn rates (darch.learnRateWeights and darch.learnRateBiases) to numbers closer to 1. I assume you are aiming for binary classification, since you only have one neuron in the output layer?

saviola777 commented 8 years ago

Closing after a month of inactivity, can be re-opened when necessary feedback is available.