maddin79 / darch

Create deep architectures in the R programming language
GNU General Public License v3.0
71 stars 31 forks source link

Own errorFunction #27

Closed alex7tula closed 6 years ago

alex7tula commented 6 years ago

Hi. i use my own errorFunction When i train model, Darch select not better model from my function but better from "Classification error" Best model by my error function (names k_er) - on epoch #8, but Darch select best from "Classification error" 28.81% How to make selection by my own error function?

See listing below:

INFO [2018-01-16 15:21:47] The current log level is: INFO INFO [2018-01-16 15:21:48] Start initial caret pre-processing. INFO [2018-01-16 15:21:48] Converting non-numeric columns in data (if any)... INFO [2018-01-16 15:21:48] Converting non-numeric columns in targets (if any)... INFO [2018-01-16 15:21:48] Dependent factor "NA" converted to 2 new variables (1-of-n coding) INFO [2018-01-16 15:21:49] Using CPU matrix multiplication. WARN [2018-01-16 15:21:49] Changing number of neurons in the output layer from 1 to 2 based on dataset. INFO [2018-01-16 15:21:49] Creating and configuring new DArch instance INFO [2018-01-16 15:21:49] Constructing a network with 4 layers (88, 50, 20, 2 neurons). INFO [2018-01-16 15:21:49] Generating RBMs. INFO [2018-01-16 15:21:49] Constructing new RBM instance with 88 visible and 50 hidden units. INFO [2018-01-16 15:21:49] Constructing new RBM instance with 50 visible and 20 hidden units. INFO [2018-01-16 15:21:49] Constructing new RBM instance with 20 visible and 2 hidden units. INFO [2018-01-16 15:21:49] DArch instance ready for training, here is a summary of its configuration: INFO [2018-01-16 15:21:49] Global parameters: INFO [2018-01-16 15:21:49] Layers parameter was c(88, 50, 20, 1), resulted in network with 4 layers and 88, 50, 20, 2 neurons INFO [2018-01-16 15:21:49] The weights for the layers were generated with "generateWeightsGlorotUniform" INFO [2018-01-16 15:21:49] Additionally, the following parameters were used for weight generation: INFO [2018-01-16 15:21:49] [weights] Parameter weights.max is 0.1 INFO [2018-01-16 15:21:49] [weights] Parameter weights.min is -0.1 INFO [2018-01-16 15:21:49] [weights] Parameter weights.mean is 0 INFO [2018-01-16 15:21:49] [weights] Parameter weights.sd is 0.01 INFO [2018-01-16 15:21:49] Weight normalization is enabled using a maxnorm bound of 1 INFO [2018-01-16 15:21:49] Bootstrapping is disabled INFO [2018-01-16 15:21:49] Train data are shuffled before each epoch INFO [2018-01-16 15:21:49] Autosaving is disabled INFO [2018-01-16 15:21:49] Using CPU for matrix multiplication INFO [2018-01-16 15:21:49] Pre-processing parameters: INFO [2018-01-16 15:21:49] [preProc] Parameter preProc.factorToNumeric is FALSE INFO [2018-01-16 15:21:49] [preProc] Parameter preProc.factorToNumeric.targets is FALSE INFO [2018-01-16 15:21:49] [preProc] Parameter preProc.fullRank is TRUE INFO [2018-01-16 15:21:49] [preProc] Parameter preProc.fullRank.targets is FALSE INFO [2018-01-16 15:21:49] [preProc] Parameter preProc.orderedToFactor.targets is TRUE INFO [2018-01-16 15:21:49] [preProc] Parameter preProc.targets is FALSE INFO [2018-01-16 15:21:49] Caret pre-processing is disabled INFO [2018-01-16 15:21:49] Pre-training parameters: INFO [2018-01-16 15:21:49] [preTrain] Parameter rbm.consecutive is FALSE INFO [2018-01-16 15:21:49] [preTrain] Parameter rbm.numEpochs is 10 INFO [2018-01-16 15:21:49] [preTrain] Parameter rbm.batchSize is 50 INFO [2018-01-16 15:21:49] [preTrain] Parameter rbm.allData is TRUE INFO [2018-01-16 15:21:49] [preTrain] Parameter rbm.lastLayer is 0 INFO [2018-01-16 15:21:49] [preTrain] Parameter rbm.learnRate is 1 INFO [2018-01-16 15:21:49] [preTrain] Parameter rbm.unitFunction is "tanhUnitRbm" INFO [2018-01-16 15:21:49] [preTrain] Parameter rbm.errorFunction is "mseError" INFO [2018-01-16 15:21:49] [preTrain] Parameter rbm.finalMomentum is 0.9 INFO [2018-01-16 15:21:49] [preTrain] Parameter rbm.initialMomentum is 0.5 INFO [2018-01-16 15:21:49] [preTrain] Parameter rbm.learnRateScale is 1 INFO [2018-01-16 15:21:49] [preTrain] Parameter rbm.momentumRampLength is 1 INFO [2018-01-16 15:21:49] [preTrain] Parameter rbm.numCD is 1 INFO [2018-01-16 15:21:49] [preTrain] Parameter rbm.updateFunction is "rbmUpdate" INFO [2018-01-16 15:21:49] [preTrain] Parameter rbm.weightDecay is 2e-04 INFO [2018-01-16 15:21:49] The selected RBMs have been trained for 0 epochs INFO [2018-01-16 15:21:49] Fine-tuning parameters: INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.batchSize is 50 INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.numEpochs is 10 INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.weightDecay is 2e-04 INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.dither is FALSE INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.fineTuneFunction is "backpropagation" INFO [2018-01-16 15:21:49] [backprop] Using backpropagation for fine-tuning INFO [2018-01-16 15:21:49] [backprop] Parameter bp.learnRate is c(1, 1, 1) INFO [2018-01-16 15:21:49] [backprop] Parameter bp.learnRateScale is 1 INFO [2018-01-16 15:21:49] [backprop] See ?backpropagation for documentation INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.dropout.oneMaskPerEpoch is TRUE INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.maxout.poolSize is 2 INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.errorFunction is "non-darch function" INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.maxout.unitFunction is "tanhUnit" INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.trainLayers1 is TRUE INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.trainLayers2 is TRUE INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.trainLayers3 is TRUE INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.unitFunction1 is "tanhUnit" INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.unitFunction2 is "maxoutUnit" INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.unitFunction3 is "linearUnit" INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.dropout1 is 0.1 INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.dropout2 is 0.2 INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.dropout3 is 0.1 INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.weightUpdateFunction1 is "weightDecayWeightUpdate" INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.weightUpdateFunction2 is "maxoutWeightUpdate" INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.weightUpdateFunction3 is "weightDecayWeightUpdate" INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.dropout is 0 INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.dropout.dropConnect is FALSE INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.dropout.momentMatching is 0 INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.elu.alpha is 1 INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.finalMomentum is 0.9 INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.initialMomentum is 0.5 INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.isClass is TRUE INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.momentumRampLength is 1 INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.nesterovMomentum is TRUE INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.returnBestModel is TRUE INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.returnBestModel.validationErrorFactor is 0.632120558828558 INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.stopClassErr is -Inf INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.stopErr is -Inf INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.stopValidClassErr is -Inf INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.stopValidErr is -Inf INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.trainLayers is c(TRUE, TRUE, TRUE) INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.unitFunction is "sigmoidUnit" INFO [2018-01-16 15:21:49] [fineTune] Parameter darch.weightUpdateFunction is "weightDecayWeightUpdate" INFO [2018-01-16 15:21:49] The network has been fine-tuned for 0 epochs INFO [2018-01-16 15:21:49] Training set consists of 8640 samples. INFO [2018-01-16 15:21:49] Validation set consists of 2880 samples INFO [2018-01-16 15:21:49] Start deep architecture fine-tuning for 10 epochs INFO [2018-01-16 15:21:49] Number of Batches: 173 (batch size 50) INFO [2018-01-16 15:21:49] Epoch: 1 of 10 INFO [2018-01-16 15:21:50] Classification error on Train set: 34.54% (2984/8640) INFO [2018-01-16 15:21:50] Train set k_er: 200000.000 INFO [2018-01-16 15:21:50] Classification error on Validation set: 63.78% (1837/2880) INFO [2018-01-16 15:21:50] Validation set k_er: 200000.000 INFO [2018-01-16 15:21:50] Finished epoch 1 of 10 after 1.04 secs (8372 patterns/sec) INFO [2018-01-16 15:21:50] Epoch: 2 of 10 INFO [2018-01-16 15:21:51] Classification error on Train set: 34.54% (2984/8640) INFO [2018-01-16 15:21:51] Train set k_er: 200000.000 INFO [2018-01-16 15:21:51] Classification error on Validation set: 63.78% (1837/2880) INFO [2018-01-16 15:21:51] Validation set k_er: 200000.000 INFO [2018-01-16 15:21:51] Finished epoch 2 of 10 after 1.15 secs (7513 patterns/sec) INFO [2018-01-16 15:21:51] Epoch: 3 of 10 INFO [2018-01-16 15:21:52] Classification error on Train set: 34.54% (2984/8640) INFO [2018-01-16 15:21:52] Train set k_er: 200000.000 INFO [2018-01-16 15:21:52] Classification error on Validation set: 63.78% (1837/2880) INFO [2018-01-16 15:21:52] Validation set k_er: 200000.000 INFO [2018-01-16 15:21:52] Finished epoch 3 of 10 after 0.814 secs (10654 patterns/sec) INFO [2018-01-16 15:21:52] Epoch: 4 of 10 INFO [2018-01-16 15:21:54] Classification error on Train set: 34.49% (2980/8640) INFO [2018-01-16 15:21:54] Train set k_er: 100000000.000 INFO [2018-01-16 15:21:54] Classification error on Validation set: 63.78% (1837/2880) INFO [2018-01-16 15:21:54] Validation set k_er: 200000.000 INFO [2018-01-16 15:21:54] Finished epoch 4 of 10 after 1.86 secs (4640 patterns/sec) INFO [2018-01-16 15:21:54] Epoch: 5 of 10 INFO [2018-01-16 15:21:56] Classification error on Train set: 31.4% (2713/8640) INFO [2018-01-16 15:21:56] Train set k_er: 100000000.000 INFO [2018-01-16 15:21:57] Classification error on Validation set: 61.49% (1771/2880) INFO [2018-01-16 15:21:57] Validation set k_er: 100000000.000 INFO [2018-01-16 15:21:57] Finished epoch 5 of 10 after 2.7 secs (3199 patterns/sec) INFO [2018-01-16 15:21:57] Epoch: 6 of 10 INFO [2018-01-16 15:21:59] Classification error on Train set: 33.01% (2852/8640) INFO [2018-01-16 15:21:59] Train set k_er: 0.006 INFO [2018-01-16 15:21:59] Classification error on Validation set: 63.65% (1833/2880) INFO [2018-01-16 15:21:59] Validation set k_er: 0.052 INFO [2018-01-16 15:21:59] Finished epoch 6 of 10 after 2.33 secs (3721 patterns/sec) INFO [2018-01-16 15:21:59] Epoch: 7 of 10 INFO [2018-01-16 15:22:01] Classification error on Train set: 32.38% (2798/8640) INFO [2018-01-16 15:22:01] Train set k_er: 0.004 INFO [2018-01-16 15:22:02] Classification error on Validation set: 63.82% (1838/2880) INFO [2018-01-16 15:22:02] Validation set k_er: 0.014 INFO [2018-01-16 15:22:02] Finished epoch 7 of 10 after 2.47 secs (3497 patterns/sec) INFO [2018-01-16 15:22:02] Epoch: 8 of 10 INFO [2018-01-16 15:22:04] Classification error on Train set: 31.91% (2757/8640) INFO [2018-01-16 15:22:04] Train set k_er: 0.003 INFO [2018-01-16 15:22:04] Classification error on Validation set: 63.16% (1819/2880) INFO [2018-01-16 15:22:04] Validation set k_er: 0.007 INFO [2018-01-16 15:22:04] Finished epoch 8 of 10 after 2.67 secs (3242 patterns/sec) INFO [2018-01-16 15:22:04] Epoch: 9 of 10 INFO [2018-01-16 15:22:06] Classification error on Train set: 30.41% (2627/8640) INFO [2018-01-16 15:22:06] Train set k_er: 100000000.000 INFO [2018-01-16 15:22:07] Classification error on Validation set: 60.21% (1734/2880) INFO [2018-01-16 15:22:07] Validation set k_er: 100000000.000 INFO [2018-01-16 15:22:07] Finished epoch 9 of 10 after 2.37 secs (3658 patterns/sec) INFO [2018-01-16 15:22:07] Epoch: 10 of 10 INFO [2018-01-16 15:22:09] Classification error on Train set: 28.81% (2489/8640) INFO [2018-01-16 15:22:09] Train set k_er: 100000000.000 INFO [2018-01-16 15:22:09] Classification error on Validation set: 57.88% (1667/2880) INFO [2018-01-16 15:22:09] Validation set k_er: 100000000.000 INFO [2018-01-16 15:22:09] Finished epoch 10 of 10 after 2.37 secs (3664 patterns/sec) INFO [2018-01-16 15:22:11] Classification error on Train set (best model): 28.81% (2489/8640) INFO [2018-01-16 15:22:11] Train set (best model) k_er: 100000000.000 INFO [2018-01-16 15:22:11] Classification error on Validation set (best model): 57.88% (1667/2880) INFO [2018-01-16 15:22:11] Validation set (best model) k_er: 100000000.000 INFO [2018-01-16 15:22:11] Best model was found after epoch 10 INFO [2018-01-16 15:22:11] Final 0.632 validation k_er: 100000000.000 INFO [2018-01-16 15:22:11] Final 0.632 validation classification error: 47.19% INFO [2018-01-16 15:22:11] Fine-tuning finished after 21.83 secs

alex7tula commented 6 years ago

I see sorting by own error function works on regression models. But in the classification models more features i prefer to use it. How to do sorting there?

saviola777 commented 6 years ago

Here is the current code used for comparing two models:

error[["class"]] < errorBest[["class"]] ||
            (error[["raw"]] <= errorBest[["raw"]]
&& error[["class"]] == errorBest[["class"]])

So a model is considered better than another model primarily considering the classification error. The raw error will only be taken into consideration when the classification error is the same. It is currently not possible to change this for classification problems.

I will consider adding a parameter for this in the future. I do wonder about the use case though – shouldn't you be interested primarily in the classification performance when dealing with a classification problem? That's what I thought when implemeneting this feature at least.

alex7tula commented 6 years ago

I need to select the best model by my own functions. Standard sorting is not suitable for my model. I guess the changes are not very time consuming. I would be very grateful, if you will do it. With this changes, you will lead logics in a uniform with regression models (where it works).

alex7tula commented 6 years ago

If the user sets his own error function, it means he wants to evaluate the model by this function, and not by any other. It seems to me so logical and right

saviola777 commented 6 years ago

Perhaps, but to me it seems that if the model your error function prefers performs worse on the classification problem it is being trained for, there is some mismatch between the error function and the task at hand. Why make it a classification problem if you don't care about the classification results?

alex7tula commented 6 years ago

If i will use retrain models in this code, i will get the right sorting by my function and right re-training?

params$rbm.numEpochs = nLearns,
params$darch.numEpochs = 1,
....
err<-1000000000000; best_epoch=1;
model=NULL;
for(i in 1:nLearns){
    model_tmp <- darch(darch = model, paramsList = params, x = MatrixLearnX , y = factor(MatrixLearnY) );
    err_tmp <-myClassError(MatrixLearnY,predict(model_tmp, newdata =  MatrixLearnX))
    print(paste0("Epoch: ",i,", Error: ",err_tmp[[2]]))
    if(err_tmp[[2]]<err){model=model_tmp;  params$rbm.numEpochs = 0; err=err_tmp[[2]];best_epoch=i; print(paste0("Best error: ",err_tmp[[2]]));}
}
saviola777 commented 6 years ago

Yes, something like that should work. Maybe I could add support for callback functions where something like this could be done easier.

saviola777 commented 6 years ago

You can now use the parameter darch.returnBestModel.classificationError = FALSE to sort models by the error function value. Let me know if this works for you.

alex7tula commented 6 years ago

No. Darch sort by Classification error. In sample below in epoch 9 i get better k_er. But darch select epoch 10 as best.

INFO [2018-01-17 12:55:17] The current log level is: INFO INFO [2018-01-17 12:55:18] Start initial caret pre-processing. INFO [2018-01-17 12:55:18] Converting non-numeric columns in data (if any)... INFO [2018-01-17 12:55:18] Converting non-numeric columns in targets (if any)... INFO [2018-01-17 12:55:18] Dependent factor "NA" converted to 2 new variables (1-of-n coding) INFO [2018-01-17 12:55:19] Using CPU matrix multiplication. WARN [2018-01-17 12:55:19] Changing number of neurons in the output layer from 1 to 2 based on dataset. INFO [2018-01-17 12:55:19] Creating and configuring new DArch instance INFO [2018-01-17 12:55:19] Constructing a network with 4 layers (88, 50, 20, 2 neurons). INFO [2018-01-17 12:55:19] Generating RBMs. INFO [2018-01-17 12:55:19] Constructing new RBM instance with 88 visible and 50 hidden units. INFO [2018-01-17 12:55:19] Constructing new RBM instance with 50 visible and 20 hidden units. INFO [2018-01-17 12:55:19] Constructing new RBM instance with 20 visible and 2 hidden units. INFO [2018-01-17 12:55:19] DArch instance ready for training, here is a summary of its configuration: INFO [2018-01-17 12:55:19] Global parameters: INFO [2018-01-17 12:55:19] Layers parameter was c(88, 50, 20, 1), resulted in network with 4 layers and 88, 50, 20, 2 neurons INFO [2018-01-17 12:55:19] The weights for the layers were generated with "generateWeightsGlorotUniform" INFO [2018-01-17 12:55:19] Additionally, the following parameters were used for weight generation: INFO [2018-01-17 12:55:19] [weights] Parameter weights.max is 0.1 INFO [2018-01-17 12:55:19] [weights] Parameter weights.min is -0.1 INFO [2018-01-17 12:55:19] [weights] Parameter weights.mean is 0 INFO [2018-01-17 12:55:19] [weights] Parameter weights.sd is 0.01 INFO [2018-01-17 12:55:19] Weight normalization is enabled using a maxnorm bound of 1 INFO [2018-01-17 12:55:19] Bootstrapping is disabled INFO [2018-01-17 12:55:19] Train data are shuffled before each epoch INFO [2018-01-17 12:55:19] Autosaving is disabled INFO [2018-01-17 12:55:19] Using CPU for matrix multiplication INFO [2018-01-17 12:55:19] Pre-processing parameters: INFO [2018-01-17 12:55:19] [preProc] Parameter preProc.factorToNumeric is FALSE INFO [2018-01-17 12:55:19] [preProc] Parameter preProc.factorToNumeric.targets is FALSE INFO [2018-01-17 12:55:19] [preProc] Parameter preProc.fullRank is TRUE INFO [2018-01-17 12:55:19] [preProc] Parameter preProc.fullRank.targets is FALSE INFO [2018-01-17 12:55:19] [preProc] Parameter preProc.orderedToFactor.targets is TRUE INFO [2018-01-17 12:55:19] [preProc] Parameter preProc.targets is FALSE INFO [2018-01-17 12:55:19] Caret pre-processing is disabled INFO [2018-01-17 12:55:19] Pre-training parameters: INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.consecutive is FALSE INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.numEpochs is 10 INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.batchSize is 50 INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.allData is TRUE INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.lastLayer is 0 INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.learnRate is 1 INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.unitFunction is "tanhUnitRbm" INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.errorFunction is "mseError" INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.finalMomentum is 0.9 INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.initialMomentum is 0.5 INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.learnRateScale is 1 INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.momentumRampLength is 1 INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.numCD is 1 INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.updateFunction is "rbmUpdate" INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.weightDecay is 2e-04 INFO [2018-01-17 12:55:19] The selected RBMs have been trained for 0 epochs INFO [2018-01-17 12:55:19] Fine-tuning parameters: INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.batchSize is 50 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.numEpochs is 10 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.weightDecay is 2e-04 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.dither is FALSE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.fineTuneFunction is "backpropagation" INFO [2018-01-17 12:55:19] [backprop] Using backpropagation for fine-tuning INFO [2018-01-17 12:55:19] [backprop] Parameter bp.learnRate is c(1, 1, 1) INFO [2018-01-17 12:55:19] [backprop] Parameter bp.learnRateScale is 1 INFO [2018-01-17 12:55:19] [backprop] See ?backpropagation for documentation INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.dropout.oneMaskPerEpoch is TRUE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.maxout.poolSize is 2 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.errorFunction is "non-darch function" INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.returnBestModel.classificationError is FALSE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.maxout.unitFunction is "tanhUnit" INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.trainLayers1 is TRUE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.trainLayers2 is TRUE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.trainLayers3 is TRUE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.unitFunction1 is "tanhUnit" INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.unitFunction2 is "maxoutUnit" INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.unitFunction3 is "linearUnit" INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.dropout1 is 0.1 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.dropout2 is 0.2 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.dropout3 is 0.1 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.weightUpdateFunction1 is "weightDecayWeightUpdate" INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.weightUpdateFunction2 is "maxoutWeightUpdate" INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.weightUpdateFunction3 is "weightDecayWeightUpdate" INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.dropout is 0 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.dropout.dropConnect is FALSE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.dropout.momentMatching is 0 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.elu.alpha is 1 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.finalMomentum is 0.9 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.initialMomentum is 0.5 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.isClass is TRUE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.momentumRampLength is 1 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.nesterovMomentum is TRUE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.returnBestModel is TRUE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.returnBestModel.validationErrorFactor is 0.632120558828558 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.stopClassErr is -Inf INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.stopErr is -Inf INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.stopValidClassErr is -Inf INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.stopValidErr is -Inf INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.trainLayers is c(TRUE, TRUE, TRUE) INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.unitFunction is "sigmoidUnit" INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.weightUpdateFunction is "weightDecayWeightUpdate" INFO [2018-01-17 12:55:19] The network has been fine-tuned for 0 epochs INFO [2018-01-17 12:55:19] Training set consists of 11520 samples. INFO [2018-01-17 12:55:19] Start deep architecture fine-tuning for 10 epochs INFO [2018-01-17 12:55:19] Number of Batches: 231 (batch size 50) INFO [2018-01-17 12:55:19] Epoch: 1 of 10 INFO [2018-01-17 12:55:21] Classification error on Train set: 41.85% (4821/11520) INFO [2018-01-17 12:55:21] Train set k_er: 200000.000 INFO [2018-01-17 12:55:21] Finished epoch 1 of 10 after 1.86 secs (6220 patterns/sec) INFO [2018-01-17 12:55:21] Epoch: 2 of 10 INFO [2018-01-17 12:55:22] Classification error on Train set: 41.85% (4821/11520) INFO [2018-01-17 12:55:22] Train set k_er: 200000.000 INFO [2018-01-17 12:55:22] Finished epoch 2 of 10 after 1.07 secs (10827 patterns/sec) INFO [2018-01-17 12:55:22] Epoch: 3 of 10 INFO [2018-01-17 12:55:25] Classification error on Train set: 28.83% (3321/11520) INFO [2018-01-17 12:55:25] Train set k_er: 0.051 INFO [2018-01-17 12:55:25] Finished epoch 3 of 10 after 3.03 secs (3806 patterns/sec) INFO [2018-01-17 12:55:25] Epoch: 4 of 10 INFO [2018-01-17 12:55:28] Classification error on Train set: 32.21% (3711/11520) INFO [2018-01-17 12:55:28] Train set k_er: 0.073 INFO [2018-01-17 12:55:28] Finished epoch 4 of 10 after 2.99 secs (3859 patterns/sec) INFO [2018-01-17 12:55:28] Epoch: 5 of 10 INFO [2018-01-17 12:55:31] Classification error on Train set: 30.16% (3475/11520) INFO [2018-01-17 12:55:31] Train set k_er: 0.058 INFO [2018-01-17 12:55:31] Finished epoch 5 of 10 after 2.97 secs (3879 patterns/sec) INFO [2018-01-17 12:55:31] Epoch: 6 of 10 INFO [2018-01-17 12:55:35] Classification error on Train set: 29.36% (3382/11520) INFO [2018-01-17 12:55:35] Train set k_er: 0.055 INFO [2018-01-17 12:55:35] Finished epoch 6 of 10 after 3.42 secs (3368 patterns/sec) INFO [2018-01-17 12:55:35] Epoch: 7 of 10 INFO [2018-01-17 12:55:38] Classification error on Train set: 30.89% (3558/11520) INFO [2018-01-17 12:55:38] Train set k_er: 0.066 INFO [2018-01-17 12:55:38] Finished epoch 7 of 10 after 2.9 secs (3979 patterns/sec) INFO [2018-01-17 12:55:38] Epoch: 8 of 10 INFO [2018-01-17 12:55:41] Classification error on Train set: 28.01% (3227/11520) INFO [2018-01-17 12:55:41] Train set k_er: 0.051 INFO [2018-01-17 12:55:41] Finished epoch 8 of 10 after 3.35 secs (3444 patterns/sec) INFO [2018-01-17 12:55:41] Epoch: 9 of 10 INFO [2018-01-17 12:55:44] Classification error on Train set: 25.48% (2935/11520) INFO [2018-01-17 12:55:44] Train set k_er: 0.043 INFO [2018-01-17 12:55:44] Finished epoch 9 of 10 after 3.19 secs (3618 patterns/sec) INFO [2018-01-17 12:55:44] Epoch: 10 of 10 INFO [2018-01-17 12:55:47] Classification error on Train set: 25.47% (2934/11520) INFO [2018-01-17 12:55:47] Train set k_er: 0.044 INFO [2018-01-17 12:55:47] Finished epoch 10 of 10 after 2.67 secs (4323 patterns/sec) INFO [2018-01-17 12:55:49] Classification error on Train set (best model): 25.47% (2934/11520) INFO [2018-01-17 12:55:49] Train set (best model) k_er: 0.044 INFO [2018-01-17 12:55:49] Best model was found after epoch 10 INFO [2018-01-17 12:55:49] Fine-tuning finished after 29.57 secs

alex7tula commented 6 years ago

Oh - you update package. I test on old version... I will try to update it on my PC

saviola777 commented 6 years ago

Yeah, it should work if you first install devtools (install.packages("devtools")) and then run install_github("maddin79/darch"). I'll try to fix some other stuff and then create a release tag 0.12.1, but I don't think I'll release it to CRAN anytime soon.

alex7tula commented 6 years ago

Darch stop works after 2,3,9 epochs. It happen with darch.errorFunction = my2ClassError, darch.returnBestModel.classificationError = FALSE, and with

darch.errorFunction = my2ClassError,

#   darch.returnBestModel.classificationError = FALSE,

INFO [2018-01-17 14:32:04] Start deep architecture fine-tuning for 10 epochs INFO [2018-01-17 14:32:04] Number of Batches: 231 (batch size 50) INFO [2018-01-17 14:32:04] Epoch: 1 of 10 INFO [2018-01-17 14:32:05] Classification error on Train set: 41.85% (4821/11520) INFO [2018-01-17 14:32:05] Train set Cross Entropy error: 1.353 INFO [2018-01-17 14:32:05] Finished epoch 1 of 10 after 1.08 secs (10686 patterns/sec) INFO [2018-01-17 14:32:05] Epoch: 2 of 10 INFO [2018-01-17 14:32:06] Classification error on Train set: 41.85% (4821/11520) INFO [2018-01-17 14:32:06] Train set Cross Entropy error: 1.344

saviola777 commented 6 years ago

Guess there's some other bug in there somewhere, sorry about that. I will have to do some testing, haven't worked on the code for more than a year prior to this.

saviola777 commented 6 years ago

Embarrasing typo – that happens when you don't test code. Please try it with the latest commit (just re-install with the command above).

alex7tula commented 6 years ago

Now it sort by my function. Thanks

alex7tula commented 6 years ago

One additional offer. Can you add early stop if i not receive new better model for X epochs. E.g. i can start learning with 1000 epochs, If i not receive better result for 100 epochs after last better result, then stop process. If better result received on 243 epoch, then stop on 343. It will save time.

saviola777 commented 6 years ago

Yeah, that's a good idea, I will add that.

alex7tula commented 6 years ago

Additional offer. When i set pretrain rbm.numEpochs = 10 Then i not see in output any message about pretraining. I not know works it or not. Can you add some message or statistics about results of pretraining?

saviola777 commented 6 years ago

There should be log output if pre-training happens. No output means no pre-training, though I'm not sure why that is the case if you set rbm.numEpochs to 10. I'll need to investigate.

alex7tula commented 6 years ago

oh. I found: INFO [2018-01-16 15:21:49] Generating RBMs. INFO [2018-01-16 15:21:49] Constructing new RBM instance with 88 visible and 50 hidden units. INFO [2018-01-16 15:21:49] Constructing new RBM instance with 50 visible and 20 hidden units. INFO [2018-01-16 15:21:49] Constructing new RBM instance with 20 visible and 2 hidden units. Sorry

saviola777 commented 6 years ago

No, that just tells you that the RBMs are created, but not that they were trained, for some reason the pre-training is skipped in your example, I have to test it on the weekend. There should be output like "Starting pre-training for 10 epochs".

alex7tula commented 6 years ago

I found why RBM not works. I add it to paramsList but not directly to darch(...) I use this code:

    params <- list(
        layers = Ln,
        darch.numEpochs = nLearns
......................
        rbm.numEpochs = 10,
        rbm.consecutive = F,
        rbm.batchSize = 50,
        rbm.allData = TRUE,
        rbm.lastLayer = 0,
        rbm.learnRate = 1,
        rbm.unitFunction = "tanhUnitRbm" 
    );

NN <- darch(    darch = NULL,   paramsList = params,    x = MatrixLearnX ,  y = factor(MatrixLearnY)    );
alex7tula commented 6 years ago

Is possiple to read rbm. params via paramsList ? Or i should to insert each to ling line? darch(darch = NULL, rbm.numEpochs = 10,rbm.consecutive = F,rbm.batchSize = 50,rbm.allData = TRUE, rbm.lastLayer = 0,rbm.learnRate = 1,rbm.unitFunction = "tanhUnitRbm" ....)

alex7tula commented 6 years ago

In darch.R you use direct access to rbm.numEpochs > 0 in other places you read it from params[["rbm.numEpochs"]]. May be here a problem?

  if (rbm.numEpochs > 0 && darch@epochs == 0)
  {
    darch <- preTrainDArch(darch, dataSet, dataSetValid = dataSetValid,
      numEpochs = params[["rbm.numEpochs"]], numCD = params[["rbm.numCD"]],
      lastLayer = params[["rbm.lastLayer"]],
      isClass = params[["darch.isClass"]],
      consecutive = params[["rbm.consecutive"]], ...)
  }
  else if (rbm.numEpochs > 0 && darch@epochs != 0)
  {
    futile.logger::flog.warn(paste("Skipping pre-training on trained DArch",
      "instance, please create a new instance to enable pre-training."))
  }
saviola777 commented 6 years ago

Ah, I see. I will see how I can fix that when I find the time.

saviola777 commented 6 years ago

If fixed the problem you described, all but the parameters described in the parameter documentation will now be taken from paramsList as expected.

alex7tula commented 6 years ago

Thanks! Now all this works.