Closed alex7tula closed 6 years ago
I see sorting by own error function works on regression models. But in the classification models more features i prefer to use it. How to do sorting there?
Here is the current code used for comparing two models:
error[["class"]] < errorBest[["class"]] ||
(error[["raw"]] <= errorBest[["raw"]]
&& error[["class"]] == errorBest[["class"]])
So a model is considered better than another model primarily considering the classification error. The raw error will only be taken into consideration when the classification error is the same. It is currently not possible to change this for classification problems.
I will consider adding a parameter for this in the future. I do wonder about the use case though – shouldn't you be interested primarily in the classification performance when dealing with a classification problem? That's what I thought when implemeneting this feature at least.
I need to select the best model by my own functions. Standard sorting is not suitable for my model. I guess the changes are not very time consuming. I would be very grateful, if you will do it. With this changes, you will lead logics in a uniform with regression models (where it works).
If the user sets his own error function, it means he wants to evaluate the model by this function, and not by any other. It seems to me so logical and right
Perhaps, but to me it seems that if the model your error function prefers performs worse on the classification problem it is being trained for, there is some mismatch between the error function and the task at hand. Why make it a classification problem if you don't care about the classification results?
If i will use retrain models in this code, i will get the right sorting by my function and right re-training?
params$rbm.numEpochs = nLearns,
params$darch.numEpochs = 1,
....
err<-1000000000000; best_epoch=1;
model=NULL;
for(i in 1:nLearns){
model_tmp <- darch(darch = model, paramsList = params, x = MatrixLearnX , y = factor(MatrixLearnY) );
err_tmp <-myClassError(MatrixLearnY,predict(model_tmp, newdata = MatrixLearnX))
print(paste0("Epoch: ",i,", Error: ",err_tmp[[2]]))
if(err_tmp[[2]]<err){model=model_tmp; params$rbm.numEpochs = 0; err=err_tmp[[2]];best_epoch=i; print(paste0("Best error: ",err_tmp[[2]]));}
}
Yes, something like that should work. Maybe I could add support for callback functions where something like this could be done easier.
You can now use the parameter darch.returnBestModel.classificationError = FALSE
to sort models by the error function value. Let me know if this works for you.
No. Darch sort by Classification error. In sample below in epoch 9 i get better k_er. But darch select epoch 10 as best.
INFO [2018-01-17 12:55:17] The current log level is: INFO INFO [2018-01-17 12:55:18] Start initial caret pre-processing. INFO [2018-01-17 12:55:18] Converting non-numeric columns in data (if any)... INFO [2018-01-17 12:55:18] Converting non-numeric columns in targets (if any)... INFO [2018-01-17 12:55:18] Dependent factor "NA" converted to 2 new variables (1-of-n coding) INFO [2018-01-17 12:55:19] Using CPU matrix multiplication. WARN [2018-01-17 12:55:19] Changing number of neurons in the output layer from 1 to 2 based on dataset. INFO [2018-01-17 12:55:19] Creating and configuring new DArch instance INFO [2018-01-17 12:55:19] Constructing a network with 4 layers (88, 50, 20, 2 neurons). INFO [2018-01-17 12:55:19] Generating RBMs. INFO [2018-01-17 12:55:19] Constructing new RBM instance with 88 visible and 50 hidden units. INFO [2018-01-17 12:55:19] Constructing new RBM instance with 50 visible and 20 hidden units. INFO [2018-01-17 12:55:19] Constructing new RBM instance with 20 visible and 2 hidden units. INFO [2018-01-17 12:55:19] DArch instance ready for training, here is a summary of its configuration: INFO [2018-01-17 12:55:19] Global parameters: INFO [2018-01-17 12:55:19] Layers parameter was c(88, 50, 20, 1), resulted in network with 4 layers and 88, 50, 20, 2 neurons INFO [2018-01-17 12:55:19] The weights for the layers were generated with "generateWeightsGlorotUniform" INFO [2018-01-17 12:55:19] Additionally, the following parameters were used for weight generation: INFO [2018-01-17 12:55:19] [weights] Parameter weights.max is 0.1 INFO [2018-01-17 12:55:19] [weights] Parameter weights.min is -0.1 INFO [2018-01-17 12:55:19] [weights] Parameter weights.mean is 0 INFO [2018-01-17 12:55:19] [weights] Parameter weights.sd is 0.01 INFO [2018-01-17 12:55:19] Weight normalization is enabled using a maxnorm bound of 1 INFO [2018-01-17 12:55:19] Bootstrapping is disabled INFO [2018-01-17 12:55:19] Train data are shuffled before each epoch INFO [2018-01-17 12:55:19] Autosaving is disabled INFO [2018-01-17 12:55:19] Using CPU for matrix multiplication INFO [2018-01-17 12:55:19] Pre-processing parameters: INFO [2018-01-17 12:55:19] [preProc] Parameter preProc.factorToNumeric is FALSE INFO [2018-01-17 12:55:19] [preProc] Parameter preProc.factorToNumeric.targets is FALSE INFO [2018-01-17 12:55:19] [preProc] Parameter preProc.fullRank is TRUE INFO [2018-01-17 12:55:19] [preProc] Parameter preProc.fullRank.targets is FALSE INFO [2018-01-17 12:55:19] [preProc] Parameter preProc.orderedToFactor.targets is TRUE INFO [2018-01-17 12:55:19] [preProc] Parameter preProc.targets is FALSE INFO [2018-01-17 12:55:19] Caret pre-processing is disabled INFO [2018-01-17 12:55:19] Pre-training parameters: INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.consecutive is FALSE INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.numEpochs is 10 INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.batchSize is 50 INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.allData is TRUE INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.lastLayer is 0 INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.learnRate is 1 INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.unitFunction is "tanhUnitRbm" INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.errorFunction is "mseError" INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.finalMomentum is 0.9 INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.initialMomentum is 0.5 INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.learnRateScale is 1 INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.momentumRampLength is 1 INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.numCD is 1 INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.updateFunction is "rbmUpdate" INFO [2018-01-17 12:55:19] [preTrain] Parameter rbm.weightDecay is 2e-04 INFO [2018-01-17 12:55:19] The selected RBMs have been trained for 0 epochs INFO [2018-01-17 12:55:19] Fine-tuning parameters: INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.batchSize is 50 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.numEpochs is 10 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.weightDecay is 2e-04 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.dither is FALSE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.fineTuneFunction is "backpropagation" INFO [2018-01-17 12:55:19] [backprop] Using backpropagation for fine-tuning INFO [2018-01-17 12:55:19] [backprop] Parameter bp.learnRate is c(1, 1, 1) INFO [2018-01-17 12:55:19] [backprop] Parameter bp.learnRateScale is 1 INFO [2018-01-17 12:55:19] [backprop] See ?backpropagation for documentation INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.dropout.oneMaskPerEpoch is TRUE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.maxout.poolSize is 2 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.errorFunction is "non-darch function" INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.returnBestModel.classificationError is FALSE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.maxout.unitFunction is "tanhUnit" INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.trainLayers1 is TRUE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.trainLayers2 is TRUE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.trainLayers3 is TRUE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.unitFunction1 is "tanhUnit" INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.unitFunction2 is "maxoutUnit" INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.unitFunction3 is "linearUnit" INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.dropout1 is 0.1 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.dropout2 is 0.2 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.dropout3 is 0.1 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.weightUpdateFunction1 is "weightDecayWeightUpdate" INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.weightUpdateFunction2 is "maxoutWeightUpdate" INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.weightUpdateFunction3 is "weightDecayWeightUpdate" INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.dropout is 0 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.dropout.dropConnect is FALSE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.dropout.momentMatching is 0 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.elu.alpha is 1 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.finalMomentum is 0.9 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.initialMomentum is 0.5 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.isClass is TRUE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.momentumRampLength is 1 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.nesterovMomentum is TRUE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.returnBestModel is TRUE INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.returnBestModel.validationErrorFactor is 0.632120558828558 INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.stopClassErr is -Inf INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.stopErr is -Inf INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.stopValidClassErr is -Inf INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.stopValidErr is -Inf INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.trainLayers is c(TRUE, TRUE, TRUE) INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.unitFunction is "sigmoidUnit" INFO [2018-01-17 12:55:19] [fineTune] Parameter darch.weightUpdateFunction is "weightDecayWeightUpdate" INFO [2018-01-17 12:55:19] The network has been fine-tuned for 0 epochs INFO [2018-01-17 12:55:19] Training set consists of 11520 samples. INFO [2018-01-17 12:55:19] Start deep architecture fine-tuning for 10 epochs INFO [2018-01-17 12:55:19] Number of Batches: 231 (batch size 50) INFO [2018-01-17 12:55:19] Epoch: 1 of 10 INFO [2018-01-17 12:55:21] Classification error on Train set: 41.85% (4821/11520) INFO [2018-01-17 12:55:21] Train set k_er: 200000.000 INFO [2018-01-17 12:55:21] Finished epoch 1 of 10 after 1.86 secs (6220 patterns/sec) INFO [2018-01-17 12:55:21] Epoch: 2 of 10 INFO [2018-01-17 12:55:22] Classification error on Train set: 41.85% (4821/11520) INFO [2018-01-17 12:55:22] Train set k_er: 200000.000 INFO [2018-01-17 12:55:22] Finished epoch 2 of 10 after 1.07 secs (10827 patterns/sec) INFO [2018-01-17 12:55:22] Epoch: 3 of 10 INFO [2018-01-17 12:55:25] Classification error on Train set: 28.83% (3321/11520) INFO [2018-01-17 12:55:25] Train set k_er: 0.051 INFO [2018-01-17 12:55:25] Finished epoch 3 of 10 after 3.03 secs (3806 patterns/sec) INFO [2018-01-17 12:55:25] Epoch: 4 of 10 INFO [2018-01-17 12:55:28] Classification error on Train set: 32.21% (3711/11520) INFO [2018-01-17 12:55:28] Train set k_er: 0.073 INFO [2018-01-17 12:55:28] Finished epoch 4 of 10 after 2.99 secs (3859 patterns/sec) INFO [2018-01-17 12:55:28] Epoch: 5 of 10 INFO [2018-01-17 12:55:31] Classification error on Train set: 30.16% (3475/11520) INFO [2018-01-17 12:55:31] Train set k_er: 0.058 INFO [2018-01-17 12:55:31] Finished epoch 5 of 10 after 2.97 secs (3879 patterns/sec) INFO [2018-01-17 12:55:31] Epoch: 6 of 10 INFO [2018-01-17 12:55:35] Classification error on Train set: 29.36% (3382/11520) INFO [2018-01-17 12:55:35] Train set k_er: 0.055 INFO [2018-01-17 12:55:35] Finished epoch 6 of 10 after 3.42 secs (3368 patterns/sec) INFO [2018-01-17 12:55:35] Epoch: 7 of 10 INFO [2018-01-17 12:55:38] Classification error on Train set: 30.89% (3558/11520) INFO [2018-01-17 12:55:38] Train set k_er: 0.066 INFO [2018-01-17 12:55:38] Finished epoch 7 of 10 after 2.9 secs (3979 patterns/sec) INFO [2018-01-17 12:55:38] Epoch: 8 of 10 INFO [2018-01-17 12:55:41] Classification error on Train set: 28.01% (3227/11520) INFO [2018-01-17 12:55:41] Train set k_er: 0.051 INFO [2018-01-17 12:55:41] Finished epoch 8 of 10 after 3.35 secs (3444 patterns/sec) INFO [2018-01-17 12:55:41] Epoch: 9 of 10 INFO [2018-01-17 12:55:44] Classification error on Train set: 25.48% (2935/11520) INFO [2018-01-17 12:55:44] Train set k_er: 0.043 INFO [2018-01-17 12:55:44] Finished epoch 9 of 10 after 3.19 secs (3618 patterns/sec) INFO [2018-01-17 12:55:44] Epoch: 10 of 10 INFO [2018-01-17 12:55:47] Classification error on Train set: 25.47% (2934/11520) INFO [2018-01-17 12:55:47] Train set k_er: 0.044 INFO [2018-01-17 12:55:47] Finished epoch 10 of 10 after 2.67 secs (4323 patterns/sec) INFO [2018-01-17 12:55:49] Classification error on Train set (best model): 25.47% (2934/11520) INFO [2018-01-17 12:55:49] Train set (best model) k_er: 0.044 INFO [2018-01-17 12:55:49] Best model was found after epoch 10 INFO [2018-01-17 12:55:49] Fine-tuning finished after 29.57 secs
Oh - you update package. I test on old version... I will try to update it on my PC
Yeah, it should work if you first install devtools (install.packages("devtools")
) and then run install_github("maddin79/darch")
. I'll try to fix some other stuff and then create a release tag 0.12.1
, but I don't think I'll release it to CRAN anytime soon.
Darch stop works after 2,3,9 epochs. It happen with darch.errorFunction = my2ClassError, darch.returnBestModel.classificationError = FALSE, and with
# darch.returnBestModel.classificationError = FALSE,
INFO [2018-01-17 14:32:04] Start deep architecture fine-tuning for 10 epochs INFO [2018-01-17 14:32:04] Number of Batches: 231 (batch size 50) INFO [2018-01-17 14:32:04] Epoch: 1 of 10 INFO [2018-01-17 14:32:05] Classification error on Train set: 41.85% (4821/11520) INFO [2018-01-17 14:32:05] Train set Cross Entropy error: 1.353 INFO [2018-01-17 14:32:05] Finished epoch 1 of 10 after 1.08 secs (10686 patterns/sec) INFO [2018-01-17 14:32:05] Epoch: 2 of 10 INFO [2018-01-17 14:32:06] Classification error on Train set: 41.85% (4821/11520) INFO [2018-01-17 14:32:06] Train set Cross Entropy error: 1.344
Guess there's some other bug in there somewhere, sorry about that. I will have to do some testing, haven't worked on the code for more than a year prior to this.
Embarrasing typo – that happens when you don't test code. Please try it with the latest commit (just re-install with the command above).
Now it sort by my function. Thanks
One additional offer. Can you add early stop if i not receive new better model for X epochs. E.g. i can start learning with 1000 epochs, If i not receive better result for 100 epochs after last better result, then stop process. If better result received on 243 epoch, then stop on 343. It will save time.
Yeah, that's a good idea, I will add that.
Additional offer. When i set pretrain rbm.numEpochs = 10 Then i not see in output any message about pretraining. I not know works it or not. Can you add some message or statistics about results of pretraining?
There should be log output if pre-training happens. No output means no pre-training, though I'm not sure why that is the case if you set rbm.numEpochs to 10. I'll need to investigate.
oh. I found: INFO [2018-01-16 15:21:49] Generating RBMs. INFO [2018-01-16 15:21:49] Constructing new RBM instance with 88 visible and 50 hidden units. INFO [2018-01-16 15:21:49] Constructing new RBM instance with 50 visible and 20 hidden units. INFO [2018-01-16 15:21:49] Constructing new RBM instance with 20 visible and 2 hidden units. Sorry
No, that just tells you that the RBMs are created, but not that they were trained, for some reason the pre-training is skipped in your example, I have to test it on the weekend. There should be output like "Starting pre-training for 10 epochs".
I found why RBM not works. I add it to paramsList but not directly to darch(...) I use this code:
params <- list(
layers = Ln,
darch.numEpochs = nLearns
......................
rbm.numEpochs = 10,
rbm.consecutive = F,
rbm.batchSize = 50,
rbm.allData = TRUE,
rbm.lastLayer = 0,
rbm.learnRate = 1,
rbm.unitFunction = "tanhUnitRbm"
);
NN <- darch( darch = NULL, paramsList = params, x = MatrixLearnX , y = factor(MatrixLearnY) );
Is possiple to read rbm. params via paramsList ? Or i should to insert each to ling line? darch(darch = NULL, rbm.numEpochs = 10,rbm.consecutive = F,rbm.batchSize = 50,rbm.allData = TRUE, rbm.lastLayer = 0,rbm.learnRate = 1,rbm.unitFunction = "tanhUnitRbm" ....)
In darch.R you use direct access to rbm.numEpochs > 0 in other places you read it from params[["rbm.numEpochs"]]. May be here a problem?
if (rbm.numEpochs > 0 && darch@epochs == 0)
{
darch <- preTrainDArch(darch, dataSet, dataSetValid = dataSetValid,
numEpochs = params[["rbm.numEpochs"]], numCD = params[["rbm.numCD"]],
lastLayer = params[["rbm.lastLayer"]],
isClass = params[["darch.isClass"]],
consecutive = params[["rbm.consecutive"]], ...)
}
else if (rbm.numEpochs > 0 && darch@epochs != 0)
{
futile.logger::flog.warn(paste("Skipping pre-training on trained DArch",
"instance, please create a new instance to enable pre-training."))
}
Ah, I see. I will see how I can fix that when I find the time.
If fixed the problem you described, all but the parameters described in the parameter documentation will now be taken from paramsList
as expected.
Thanks! Now all this works.
Hi. i use my own errorFunction When i train model, Darch select not better model from my function but better from "Classification error" Best model by my error function (names k_er) - on epoch #8, but Darch select best from "Classification error" 28.81% How to make selection by my own error function?
See listing below: