Closed richigneven closed 7 years ago
Hello,
have your tried disabling weight normalization? And have you tried playing around with the network structure, or the activation function? It's hard to say without further information, but most often it's a problem with the parameters.
Please paste the beginning of the log output for the training process (where the parameters are listed etc.).
Thanks a lot for this quick respons..
Ive tried to disable the weight normalization and dont really no which paramters how to address.. here is a beginning of the output:
darchmodel <- darch(return_customer~ newsletter + success_rate + goods_value +already_existing_account , data = train_darch, bootstrap = T, darch.numEpochs =
- 30, darch.batchSize = 2,
- normalizeWeights = F,
- darch.errorFunction = rmseError,
- darch.stopValidClassErr = 0.15,
- darch.returnBestModel.validationErrorFactor = 1) INFO [2017-01-28 02:07:23] The current log level is: INFO INFO [2017-01-28 02:07:23] Start initial caret pre-processing. INFO [2017-01-28 02:07:23] Converting non-numeric columns in data (if any)... INFO [2017-01-28 02:07:23] Converting non-numeric columns in targets (if any)... INFO [2017-01-28 02:07:23] The current log level is: INFO INFO [2017-01-28 02:07:23] Using CPU matrix multiplication. WARN [2017-01-28 02:07:23] No vector given for "layers" parameter, constructing shallow network with one hidden layer of 10 neurons. INFO [2017-01-28 02:07:23] Bootstrapping is started with 5600 samples, bootstrapping results in 3550 training (3550 unique) and 2050 validation samples for this run. INFO [2017-01-28 02:07:23] Creating and configuring new DArch instance INFO [2017-01-28 02:07:23] Constructing a network with 3 layers (4, 10, 1 neurons). INFO [2017-01-28 02:07:23] Generating RBMs. INFO [2017-01-28 02:07:23] Constructing new RBM instance with 4 visible and 10 hidden units. INFO [2017-01-28 02:07:23] Constructing new RBM instance with 10 visible and 1 hidden units. INFO [2017-01-28 02:07:23] DArch instance ready for training, here is a summary of its configuration: INFO [2017-01-28 02:07:23] Global parameters: INFO [2017-01-28 02:07:23] Layers parameter was 10, resulted in network with 3 layers and 4, 10, 1 neurons INFO [2017-01-28 02:07:23] The weights for the layers were generated with "generateWeightsGlorotUniform" INFO [2017-01-28 02:07:23] Additionally, the following parameters were used for weight generation: INFO [2017-01-28 02:07:23] [weights] Parameter weights.max is 0.1 INFO [2017-01-28 02:07:23] [weights] Parameter weights.min is -0.1 INFO [2017-01-28 02:07:23] [weights] Parameter weights.mean is 0 INFO [2017-01-28 02:07:23] [weights] Parameter weights.sd is 0.01 INFO [2017-01-28 02:07:23] Weight normalization is disabled INFO [2017-01-28 02:07:23] Bootstrapping is enabled with the following parameters: INFO [2017-01-28 02:07:23] [bootstrap] Parameter bootstrap.unique is TRUE INFO [2017-01-28 02:07:23] [bootstrap] Parameter bootstrap.num is 0 INFO [2017-01-28 02:07:23] Train data are shuffled before each epoch INFO [2017-01-28 02:07:23] Autosaving is disabled INFO [2017-01-28 02:07:23] Using CPU for matrix multiplication INFO [2017-01-28 02:07:23] Pre-processing parameters: INFO [2017-01-28 02:07:23] [preProc] Parameter preProc.factorToNumeric is FALSE INFO [2017-01-28 02:07:23] [preProc] Parameter preProc.factorToNumeric.targets is FALSE INFO [2017-01-28 02:07:23] [preProc] Parameter preProc.fullRank is TRUE INFO [2017-01-28 02:07:23] [preProc] Parameter preProc.fullRank.targets is FALSE INFO [2017-01-28 02:07:23] [preProc] Parameter preProc.orderedToFactor.targets is TRUE INFO [2017-01-28 02:07:23] [preProc] Parameter preProc.targets is FALSE INFO [2017-01-28 02:07:23] Caret pre-processing is disabled INFO [2017-01-28 02:07:23] Pre-training parameters: INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.allData is FALSE INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.batchSize is 1 INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.consecutive is TRUE INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.errorFunction is "mseError" INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.finalMomentum is 0.9 INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.initialMomentum is 0.5 INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.lastLayer is 0 INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.learnRate is 1 INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.learnRateScale is 1 INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.momentumRampLength is 1 INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.numCD is 1 INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.numEpochs is 0 INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.unitFunction is "sigmoidUnitRbm" INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.updateFunction is "rbmUpdate" INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.weightDecay is 2e-04 INFO [2017-01-28 02:07:23] The selected RBMs have been trained for 0 epochs INFO [2017-01-28 02:07:23] Fine-tuning parameters: INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.batchSize is 2 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.dither is FALSE INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.dropout is 0 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.dropout.dropConnect is FALSE INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.dropout.momentMatching is 0 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.dropout.oneMaskPerEpoch is FALSE INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.elu.alpha is 1 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.errorFunction is "rmseError" INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.finalMomentum is 0.9 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.fineTuneFunction is "backpropagation" INFO [2017-01-28 02:07:24] [backprop] Using backpropagation for fine-tuning INFO [2017-01-28 02:07:24] [backprop] Parameter bp.learnRate is c(1, 1) INFO [2017-01-28 02:07:24] [backprop] Parameter bp.learnRateScale is 1 INFO [2017-01-28 02:07:24] [backprop] See ?backpropagation for documentation INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.initialMomentum is 0.5 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.isClass is TRUE INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.maxout.poolSize is 2 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.maxout.unitFunction is "linearUnit" INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.momentumRampLength is 1 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.nesterovMomentum is TRUE INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.numEpochs is 30 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.returnBestModel is TRUE INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.returnBestModel.validationErrorFactor is 1 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.stopClassErr is -Inf INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.stopErr is -Inf INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.stopValidClassErr is 0.15 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.stopValidErr is -Inf INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.trainLayers is c(TRUE, TRUE) INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.unitFunction is "sigmoidUnit" INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.weightDecay is 0 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.weightUpdateFunction is "weightDecayWeightUpdate" INFO [2017-01-28 02:07:24] The network has been fine-tuned for 0 epochs INFO [2017-01-28 02:07:24] Training set consists of 3550 samples. INFO [2017-01-28 02:07:24] Validation set consists of 2050 samples INFO [2017-01-28 02:07:24] Start deep architecture fine-tuning for 30 epochs INFO [2017-01-28 02:07:24] Number of Batches: 1775 (batch size 2) INFO [2017-01-28 02:07:24] Epoch: 1 of 30 INFO [2017-01-28 02:07:25] Classification error on Train set: 18.82% (668/3550) INFO [2017-01-28 02:07:25] Train set RMSE: 0.434 INFO [2017-01-28 02:07:25] Classification error on Validation set: 17.95% (368/2050) INFO [2017-01-28 02:07:25] Validation set RMSE: 0.424 INFO [2017-01-28 02:07:25] Finished epoch 1 of 30 after 0.927 secs (3828 patterns/sec) INFO [2017-01-28 02:07:25] Epoch: 2 of 30 INFO [2017-01-28 02:07:26] Classification error on Train set: 18.82% (668/3550) INFO [2017-01-28 02:07:26] Train set RMSE: 0.434 INFO [2017-01-28 02:07:26] Classification error on Validation set: 17.95% (368/2050) INFO [2017-01-28 02:07:26] Validation set RMSE: 0.424 INFO [2017-01-28 02:07:26] Finished epoch 2 of 30 after 0.903 secs (3930 patterns/sec) INFO [2017-01-28 02:07:26] Epoch: 3 of 30 INFO [2017-01-28 02:07:26] Classification error on Train set: 18.82% (668/3550) INFO [2017-01-28 02:07:26] Train set RMSE: 0.434 INFO [2017-01-28 02:07:26] Classification error on Validation set: 17.95% (368/2050) INFO [2017-01-28 02:07:27] Validation set RMSE: 0.424 INFO [2017-01-28 02:07:27] Finished epoch 3 of 30 after 0.9 secs (3943 patterns/sec) INFO [2017-01-28 02:07:27] Epoch: 4 of 30 INFO [2017-01-28 02:07:27] Classification error on Train set: 18.82% (668/3550) INFO [2017-01-28 02:07:27] Train set RMSE: 0.434 INFO [2017-01-28 02:07:27] Classification error on Validation set: 17.95% (368/2050) INFO [2017-01-28 02:07:27] Validation set RMSE: 0.424 INFO [2017-01-28 02:07:27] Finished epoch 4 of 30 after 0.904 secs (3926 patterns/sec)
when using normalized data (except identification variable and wanted output return) the algorithm need like 10 seconds per epoch...but still same result..:( im appreciating any help..cheers, richi
What does the output variable look like? You have to consider that you are using the sigmoid unit function for the output. A classification with just one output variable is usually problematic, try to convert your output variable to a factor, so that it results in two output neurons (1 0
for the first class and 0 1
for the second class) and use softmax on the output layer. You might also want to consider playing around with the preProc.*
parameters (e.g., preProc.targets = T
), as well as increase the batch size.
Sorry that I can't give you a definite solution, maybe if you give me more information about the dataset (value ranges etc.), I could generate a similar dataset and test with it.
Thanks again. I have converted in into two now with 1 beeing 1 and 1 and 0 beeing 0 and 0. That doesnt make a difference does it? Im back to the 100% classification error now.. Here the structure of the dataset:
data.frame': 64789 obs. of 37 variables: $ ID : int 1 2 3 4 5 6 7 8 9 10 ... $ title : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ newsletter : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 2 ... $ delivery : Factor w/ 2 levels "option 0 ","option 1": 1 2 1 1 2 1 2 2 1 1 ... $ coupon : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 1 1 2 1 ... $ advertising_code : Factor w/ 2 levels "no","yes": 1 1 1 1 2 2 1 1 1 1 ... $ goods_value : int 2 2 1 3 4 3 4 4 4 4 ... $ giftwrapping : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ referrer : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 2 2 1 1 ... $ cost_shipping : Factor w/ 2 levels "no","yes": 1 1 2 1 1 2 1 1 1 1 ... $ weight : num 0.737 0.368 0.047 0 0.843 ... $ return_customer : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ weight_classes : num 3 2 1 1 3 4 2 1 4 4 ... $ already_existing_account: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ... $ success_rate : num 1 0 1 1 1 1 1 1 1 1 ... $ lor_classes : Factor w/ 5 levels "Instant purchase",..: 1 1 1 1 1 1 1 1 2 1 ... $ waiting_time_class : Factor w/ 6 levels "canceled by customer",..: 3 6 4 2 2 3 3 2 3 3 ... $ book : Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 2 2 1 ... $ paperback : Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 1 1 1 1 ... $ schoolbook : Factor w/ 2 levels "no","yes": 1 1 1 1 2 1 1 1 1 1 ... $ ebook : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ audiobook : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ audiobook_download : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ... $ film : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ musical : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ hardware : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ imported : Factor w/ 2 levels "no","yes": 1 1 2 1 1 1 1 1 1 2 ... $ other : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ used : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 1 1 1 ... $ woe.postcode_invoice : num 0.0721 -0.2512 0.0943 -0.0814 -0.1449 ... $ woe.form_of_address : num 0.0496 0.0853 0.0496 -0.4913 0.0496 ... $ woe.payment : num 0.0708 -0.2935 0.093 0.093 -0.2935 ... $ woe.size_of_order : num -0.0852 0.5821 0.1384 -0.0852 -0.0852 ... $ woe.month_of_order : num 0.1951 0.1951 0.0266 0.0631 0.0266 ... $ woe.model : num 0.00296 0.10121 0.10121 0.10121 0.10121 ... $ woe.email_domain : num 0.006664 0.026065 0.000671 -0.108222 -0.064984 ... $ woe.weekday : num -0.00303 0.00069 0.00069 -0.00303 0.03798 ...
*for the darch training im using a smaller dataset which also includes the variable $return_customer2 which is the same as $return_customer.
but I have normalized the variables and for simplicity and I think woe wouldnt make sense for prediction and so fourth im using only newsletter, success_rate, goods_value and already_existing_acount. also these variables are known to be important through other predictions.
once again my last output..:
data$return_customer [1] no no no no no no no no no no no no no no no no no no no no no no no no no no ..... Levels: no yes
View(data) srt(data) Error: could not find function "srt" str(data) 'data.frame': 64789 obs. of 37 variables: $ ID : int 1 2 3 4 5 6 7 8 9 10 ... $ title : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ newsletter : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 2 ... $ delivery : Factor w/ 2 levels "option 0 ","option 1": 1 2 1 1 2 1 2 2 1 1 ... $ coupon : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 1 1 2 1 ... $ advertising_code : Factor w/ 2 levels "no","yes": 1 1 1 1 2 2 1 1 1 1 ... $ goods_value : int 2 2 1 3 4 3 4 4 4 4 ... $ giftwrapping : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ referrer : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 2 2 1 1 ... $ cost_shipping : Factor w/ 2 levels "no","yes": 1 1 2 1 1 2 1 1 1 1 ... $ weight : num 0.737 0.368 0.047 0 0.843 ... $ return_customer : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ weight_classes : num 3 2 1 1 3 4 2 1 4 4 ... $ already_existing_account: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ... $ success_rate : num 1 0 1 1 1 1 1 1 1 1 ... $ lor_classes : Factor w/ 5 levels "Instant purchase",..: 1 1 1 1 1 1 1 1 2 1 ... $ waiting_time_class : Factor w/ 6 levels "canceled by customer",..: 3 6 4 2 2 3 3 2 3 3 ... $ book : Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 2 2 1 ... $ paperback : Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 1 1 1 1 ... $ schoolbook : Factor w/ 2 levels "no","yes": 1 1 1 1 2 1 1 1 1 1 ... $ ebook : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ audiobook : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ audiobook_download : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ... $ film : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ musical : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ hardware : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ imported : Factor w/ 2 levels "no","yes": 1 1 2 1 1 1 1 1 1 2 ... $ other : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ used : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 1 1 1 ... $ woe.postcode_invoice : num 0.0721 -0.2512 0.0943 -0.0814 -0.1449 ... $ woe.form_of_address : num 0.0496 0.0853 0.0496 -0.4913 0.0496 ... $ woe.payment : num 0.0708 -0.2935 0.093 0.093 -0.2935 ... $ woe.size_of_order : num -0.0852 0.5821 0.1384 -0.0852 -0.0852 ... $ woe.month_of_order : num 0.1951 0.1951 0.0266 0.0631 0.0266 ... $ woe.model : num 0.00296 0.10121 0.10121 0.10121 0.10121 ... $ woe.email_domain : num 0.006664 0.026065 0.000671 -0.108222 -0.064984 ... $ woe.weekday : num -0.00303 0.00069 0.00069 -0.00303 0.03798 ... data_darch <- data_normal[c(7001:14000,55000:56575), ] data_normal[,c(2:11, 13:28)] <- lapply(data_normal[,c(2:11, 13:28)], scale) #normalizing without ID and return_customer data_darch <- data_normal[c(7001:14000,55000:56575), ] as.factor(data_normal$return_customer) .... Levels: 1 2
variable change
data$return_customer1 <-
data_darch$return_customer <- ifelse(data_darch$return_customer == 1, 0, ifelse(data_darch$return_customer == 2, 1, data_darch$return_customer))
subset <- data_darch$return_customer data_darch$return_customer2 <- subset
data pationing
actual data split
1)slit data set up into a) data set known and b) data with return customer not known
2)split knwon data up into train and test set
3) use the unknown data as a validation set
data_darch_known <- data_darch[!is.na(data_darch$return_customer),] idx.train2 <- createDataPartition(y = data_darch_known$return_customer, p = 0.8, list = FALSE) # Draw a random, stratified sample including p percent of the data train_darch <- data_darch_known[idx.train2, ] # training set test_darch <- data_darch_known[-idx.train2, ] # test set (drop all observations with train indeces) validation_darch <- data_darch[is.na(data_darch$return_customer),]
library(darch)
model training
darchmodel <- darch( cbind (return_customer, return_customer2)~ newsletter + success_rate + goods_value +already_existing_account , data = train_darch, bootstrap = T, darch.numEpochs =
- 5, darch.batchSize = 10,
- normalizeWeights = T,
- darch.errorFunction = rmseError,
- darch.stopValidClassErr = 0.25,
- darch.returnBestModel.validationErrorFactor = 1,
- preProc.params = T,
- darch.unitFunction = c("sigmoidUnit", "softmaxUnit")) INFO [2017-01-28 14:06:12] The current log level is: INFO INFO [2017-01-28 14:06:12] Start initial caret pre-processing. INFO [2017-01-28 14:06:12] Converting non-numeric columns in data (if any)... INFO [2017-01-28 14:06:12] Converting non-numeric columns in targets (if any)... INFO [2017-01-28 14:06:12] The current log level is: INFO INFO [2017-01-28 14:06:12] Using CPU matrix multiplication. WARN [2017-01-28 14:06:12] No vector given for "layers" parameter, constructing shallow network with one hidden layer of 10 neurons. INFO [2017-01-28 14:06:12] Bootstrapping is started with 5600 samples, bootstrapping results in 3507 training (3507 unique) and 2093 validation samples for this run. INFO [2017-01-28 14:06:12] Creating and configuring new DArch instance INFO [2017-01-28 14:06:12] Constructing a network with 3 layers (4, 10, 2 neurons). INFO [2017-01-28 14:06:12] Generating RBMs. INFO [2017-01-28 14:06:12] Constructing new RBM instance with 4 visible and 10 hidden units. INFO [2017-01-28 14:06:12] Constructing new RBM instance with 10 visible and 2 hidden units. INFO [2017-01-28 14:06:12] DArch instance ready for training, here is a summary of its configuration: INFO [2017-01-28 14:06:12] Global parameters: INFO [2017-01-28 14:06:12] Layers parameter was 10, resulted in network with 3 layers and 4, 10, 2 neurons INFO [2017-01-28 14:06:12] The weights for the layers were generated with "generateWeightsGlorotUniform" INFO [2017-01-28 14:06:12] Additionally, the following parameters were used for weight generation: INFO [2017-01-28 14:06:12] [weights] Parameter weights.max is 0.1 INFO [2017-01-28 14:06:12] [weights] Parameter weights.min is -0.1 INFO [2017-01-28 14:06:12] [weights] Parameter weights.mean is 0 INFO [2017-01-28 14:06:12] [weights] Parameter weights.sd is 0.01 INFO [2017-01-28 14:06:12] Weight normalization is enabled using a maxnorm bound of 15 INFO [2017-01-28 14:06:12] Bootstrapping is enabled with the following parameters: INFO [2017-01-28 14:06:12] [bootstrap] Parameter bootstrap.unique is TRUE INFO [2017-01-28 14:06:12] [bootstrap] Parameter bootstrap.num is 0 INFO [2017-01-28 14:06:12] Train data are shuffled before each epoch INFO [2017-01-28 14:06:12] Autosaving is disabled INFO [2017-01-28 14:06:12] Using CPU for matrix multiplication INFO [2017-01-28 14:06:12] Pre-processing parameters: INFO [2017-01-28 14:06:12] [preProc] Parameter preProc.factorToNumeric is FALSE INFO [2017-01-28 14:06:12] [preProc] Parameter preProc.factorToNumeric.targets is FALSE INFO [2017-01-28 14:06:12] [preProc] Parameter preProc.fullRank is TRUE INFO [2017-01-28 14:06:12] [preProc] Parameter preProc.fullRank.targets is FALSE INFO [2017-01-28 14:06:12] [preProc] Parameter preProc.orderedToFactor.targets is TRUE INFO [2017-01-28 14:06:12] [preProc] Parameter preProc.targets is FALSE INFO [2017-01-28 14:06:12] Caret pre-processing is disabled INFO [2017-01-28 14:06:12] Pre-training parameters: INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.allData is FALSE INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.batchSize is 1 INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.consecutive is TRUE INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.errorFunction is "mseError" INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.finalMomentum is 0.9 INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.initialMomentum is 0.5 INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.lastLayer is 0 INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.learnRate is 1 INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.learnRateScale is 1 INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.momentumRampLength is 1 INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.numCD is 1 INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.numEpochs is 0 INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.unitFunction is "sigmoidUnitRbm" INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.updateFunction is "rbmUpdate" INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.weightDecay is 2e-04 INFO [2017-01-28 14:06:12] The selected RBMs have been trained for 0 epochs INFO [2017-01-28 14:06:12] Fine-tuning parameters: INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.batchSize is 10 INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.dither is FALSE INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.dropout is 0 INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.dropout.dropConnect is FALSE INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.dropout.momentMatching is 0 INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.dropout.oneMaskPerEpoch is FALSE INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.elu.alpha is 1 INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.errorFunction is "rmseError" INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.finalMomentum is 0.9 INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.fineTuneFunction is "backpropagation" INFO [2017-01-28 14:06:12] [backprop] Using backpropagation for fine-tuning INFO [2017-01-28 14:06:12] [backprop] Parameter bp.learnRate is c(1, 1) INFO [2017-01-28 14:06:12] [backprop] Parameter bp.learnRateScale is 1 INFO [2017-01-28 14:06:12] [backprop] See ?backpropagation for documentation INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.initialMomentum is 0.5 INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.isClass is TRUE INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.maxout.poolSize is 2 INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.maxout.unitFunction is "linearUnit" INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.momentumRampLength is 1 INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.nesterovMomentum is TRUE INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.numEpochs is 5 INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.returnBestModel is TRUE INFO [2017-01-28 14:06:13] [fineTune] Parameter darch.returnBestModel.validationErrorFactor is 1 INFO [2017-01-28 14:06:13] [fineTune] Parameter darch.stopClassErr is -Inf INFO [2017-01-28 14:06:13] [fineTune] Parameter darch.stopErr is -Inf INFO [2017-01-28 14:06:13] [fineTune] Parameter darch.stopValidClassErr is 0.25 INFO [2017-01-28 14:06:13] [fineTune] Parameter darch.stopValidErr is -Inf INFO [2017-01-28 14:06:13] [fineTune] Parameter darch.trainLayers is c(TRUE, TRUE) INFO [2017-01-28 14:06:13] [fineTune] Parameter darch.unitFunction is c("sigmoidUnit", "softmaxUnit") INFO [2017-01-28 14:06:13] [fineTune] Parameter darch.weightDecay is 0 INFO [2017-01-28 14:06:13] [fineTune] Parameter darch.weightUpdateFunction is "weightDecayWeightUpdate" INFO [2017-01-28 14:06:13] The network has been fine-tuned for 0 epochs INFO [2017-01-28 14:06:13] Training set consists of 3507 samples. INFO [2017-01-28 14:06:13] Validation set consists of 2093 samples INFO [2017-01-28 14:06:13] Start deep architecture fine-tuning for 5 epochs INFO [2017-01-28 14:06:13] Number of Batches: 351 (batch size 10) INFO [2017-01-28 14:06:13] Epoch: 1 of 5 INFO [2017-01-28 14:06:13] Classification error on Train set: 100% (3507/3507) INFO [2017-01-28 14:06:13] Train set RMSE: 1.000 INFO [2017-01-28 14:06:13] Classification error on Validation set: 100% (2093/2093) INFO [2017-01-28 14:06:13] Validation set RMSE: 1.000 INFO [2017-01-28 14:06:13] Finished epoch 1 of 5 after 0.236 secs (14860 patterns/sec) INFO [2017-01-28 14:06:13] Epoch: 2 of 5 INFO [2017-01-28 14:06:13] Classification error on Train set: 100% (3507/3507) INFO [2017-01-28 14:06:13] Train set RMSE: 1.000 INFO [2017-01-28 14:06:13] Classification error on Validation set: 100% (2093/2093) INFO [2017-01-28 14:06:13] Validation set RMSE: 1.000 INFO [2017-01-28 14:06:13] Finished epoch 2 of 5 after 0.24 secs (14810 patterns/sec) INFO [2017-01-28 14:06:13] Epoch: 3 of 5 INFO [2017-01-28 14:06:13] Classification error on Train set: 100% (3507/3507) INFO [2017-01-28 14:06:13] Train set RMSE: 1.000 INFO [2017-01-28 14:06:13] Classification error on Validation set: 100% (2093/2093) INFO [2017-01-28 14:06:13] Validation set RMSE: 1.000 INFO [2017-01-28 14:06:13] Finished epoch 3 of 5 after 0.253 secs (14784 patterns/sec) INFO [2017-01-28 14:06:13] Epoch: 4 of 5 INFO [2017-01-28 14:06:14] Classification error on Train set: 100% (3507/3507) INFO [2017-01-28 14:06:14] Train set RMSE: 1.000 INFO [2017-01-28 14:06:14] Classification error on Validation set: 100% (2093/2093) INFO [2017-01-28 14:06:14] Validation set RMSE: 1.000 INFO [2017-01-28 14:06:14] Finished epoch 4 of 5 after 0.237 secs (15840 patterns/sec) INFO [2017-01-28 14:06:14] Epoch: 5 of 5 INFO [2017-01-28 14:06:14] Classification error on Train set: 100% (3507/3507) INFO [2017-01-28 14:06:14] Train set RMSE: 1.000 INFO [2017-01-28 14:06:14] Classification error on Validation set: 100% (2093/2093) INFO [2017-01-28 14:06:14] Validation set RMSE: 1.000 INFO [2017-01-28 14:06:14] Finished epoch 5 of 5 after 0.239 secs (14674 patterns/sec) INFO [2017-01-28 14:06:14] Classification error on Train set (best model): 100% (3507/3507) INFO [2017-01-28 14:06:14] Train set (best model) RMSE: 1.000 INFO [2017-01-28 14:06:14] Classification error on Validation set (best model): 100% (2093/2093) INFO [2017-01-28 14:06:14] Validation set (best model) RMSE: 1.000 INFO [2017-01-28 14:06:14] Best model was found after epoch 5 INFO [2017-01-28 14:06:14] Final 1.000 validation RMSE: 1.000 INFO [2017-01-28 14:06:14] Final 1.000 validation classification error: 100.00% INFO [2017-01-28 14:06:14] Fine-tuning finished after 1.251 secs
model prediction
yhat_darch <- predict(darchmodel, newdata = test_darch, type = "bin") #type = "prob"[,2] also possible!?
evaluation
numIncorrect <- sum(yhat_darch != test_darch[,return_customer]) Error in
[.data.frame
(test_darch, , return_customer) : object 'return_customer' not found cat(paste0("Incorrect classifications on all examples: ", numIncorrect, " (",
- round(numIncorrect/nrow(test_darch)*100, 2), "%)\n")) Incorrect classifications on all examples: 1400 (100%)
Alright, I don't know why darch doesn't generate a network with two output neurons for your dataset, try to use the parameter preProc.fullRank.targets = T
. You don't need to convert the output variable yourself.
Ahh okay, thats easier. thanks... kept playing now with the variables but still a 100% anti overfitting...how is this even possible? if u have any idea, im happy for any suggestions which parameters might be to blame...here my last output...
darchmodel <- darch(return_customer~ newsletter + success_rate + goods_value +already_existing_account , data = train_darch,
Pre-processing:
INFO [2017-01-28 18:36:50] Converting non-numeric columns in targets (if any)...
INFO [2017-01-28 18:36:50] Converting 1 columns (y) to numeric
INFO [2017-01-28 18:36:50] The current log level is: INFO
INFO [2017-01-28 18:36:50] Using CPU matrix multiplication.
WARN [2017-01-28 18:36:50] No vector given for "layers" parameter, constructing shallow network with one hidden layer of 10 neurons.
INFO [2017-01-28 18:36:50] Creating and configuring new DArch instance
INFO [2017-01-28 18:36:50] Constructing a network with 3 layers (4, 10, 1 neurons).
INFO [2017-01-28 18:36:50] Generating RBMs.
INFO [2017-01-28 18:36:50] Constructing new RBM instance with 4 visible and 10 hidden units.
INFO [2017-01-28 18:36:50] Constructing new RBM instance with 10 visible and 1 hidden units.
INFO [2017-01-28 18:36:50] DArch instance ready for training, here is a summary of its configuration:
INFO [2017-01-28 18:36:50] Global parameters:
INFO [2017-01-28 18:36:50] Layers parameter was 10, resulted in network with 3 layers and 4, 10, 1 neurons
INFO [2017-01-28 18:36:50] The weights for the layers were generated with "generateWeightsGlorotUniform"
INFO [2017-01-28 18:36:50] Additionally, the following parameters were used for weight generation:
INFO [2017-01-28 18:36:50] [weights] Parameter weights.max is 0.1
INFO [2017-01-28 18:36:50] [weights] Parameter weights.min is -0.1
INFO [2017-01-28 18:36:50] [weights] Parameter weights.mean is 0
INFO [2017-01-28 18:36:50] [weights] Parameter weights.sd is 0.01
INFO [2017-01-28 18:36:51] Weight normalization is disabled
INFO [2017-01-28 18:36:51] Bootstrapping is disabled
INFO [2017-01-28 18:36:51] Train data are shuffled before each epoch
INFO [2017-01-28 18:36:51] Autosaving is disabled
INFO [2017-01-28 18:36:51] Using CPU for matrix multiplication
INFO [2017-01-28 18:36:51] Pre-processing parameters:
INFO [2017-01-28 18:36:51] [preProc] Parameter preProc.factorToNumeric is TRUE
INFO [2017-01-28 18:36:51] [preProc] Parameter preProc.factorToNumeric.targets is TRUE
INFO [2017-01-28 18:36:51] [preProc] Parameter preProc.fullRank is TRUE
INFO [2017-01-28 18:36:51] [preProc] Parameter preProc.fullRank.targets is TRUE
INFO [2017-01-28 18:36:51] [preProc] Parameter preProc.orderedToFactor.targets is TRUE
INFO [2017-01-28 18:36:51] [preProc] Parameter preProc.targets is TRUE
INFO [2017-01-28 18:36:51] Caret pre-processing is disabled
INFO [2017-01-28 18:36:51] Pre-training parameters:
INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.allData is FALSE
INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.batchSize is 50
INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.consecutive is TRUE
INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.errorFunction is "rmseError"
INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.finalMomentum is 0.9
INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.initialMomentum is 0.5
INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.lastLayer is 0
INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.learnRate is 1
INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.learnRateScale is 1
INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.momentumRampLength is 1
INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.numCD is 1
INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.numEpochs is 4
INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.unitFunction is "sigmoidUnitRbm"
INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.updateFunction is "rbmUpdate"
INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.weightDecay is 2e-04
INFO [2017-01-28 18:36:51] The selected RBMs have been trained for 0 epochs
INFO [2017-01-28 18:36:51] Fine-tuning parameters:
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.batchSize is 1
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.dither is FALSE
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.dropout is 0
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.dropout.dropConnect is FALSE
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.dropout.momentMatching is 0
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.dropout.oneMaskPerEpoch is FALSE
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.elu.alpha is 1
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.errorFunction is "rmseError"
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.finalMomentum is 0.9
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.fineTuneFunction is "rpropagation"
INFO [2017-01-28 18:36:51] [RPROP] Using rpropagation for fine-tuning
INFO [2017-01-28 18:36:51] [RPROP] Parameter rprop.method is "iRprop+"
INFO [2017-01-28 18:36:51] [RPROP] Parameter rprop.decFact is 0.5
INFO [2017-01-28 18:36:51] [RPROP] Parameter rprop.incFact is 1.2
INFO [2017-01-28 18:36:51] [RPROP] Parameter rprop.initDelta is 0.0125
INFO [2017-01-28 18:36:51] [RPROP] Parameter rprop.minDelta is 1e-06
INFO [2017-01-28 18:36:51] [RPROP] Parameter rprop.maxDelta is 50
INFO [2017-01-28 18:36:51] [RPROP] See ?rpropagation for documentation
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.initialMomentum is 0.9
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.isClass is TRUE
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.maxout.poolSize is 2
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.maxout.unitFunction is "linearUnit"
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.momentumRampLength is 1
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.nesterovMomentum is TRUE
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.numEpochs is 5
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.returnBestModel is TRUE
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.returnBestModel.validationErrorFactor is 0.632120558828558
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.stopClassErr is -Inf
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.stopErr is -Inf
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.stopValidClassErr is -Inf
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.stopValidErr is -Inf
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.trainLayers is c(TRUE, TRUE)
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.unitFunction is "sigmoidUnit"
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.weightDecay is 0
INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.weightUpdateFunction is "weightDecayWeightUpdate"
INFO [2017-01-28 18:36:51] The network has been fine-tuned for 0 epochs
INFO [2017-01-28 18:36:51] Starting pre-training for 4 epochs
INFO [2017-01-28 18:36:51] Training set consists of 5601 samples
INFO [2017-01-28 18:36:51] The first 2 RBMs are going to be trained
INFO [2017-01-28 18:36:51] Starting the training of the rbm with 4 visible and 10 hidden units.
INFO [2017-01-28 18:36:51] [RBM 4x10] Epoch 1 RMSE error: 0.0834969430272899
INFO [2017-01-28 18:36:51] Finished epoch 1 after 0.1306019 secs
INFO [2017-01-28 18:36:51] [RBM 4x10] Epoch 2 RMSE error: 0.0919481965031578
INFO [2017-01-28 18:36:51] Finished epoch 2 after 0.1141999 secs
INFO [2017-01-28 18:36:51] [RBM 4x10] Epoch 3 RMSE error: 0.0741237990805017
INFO [2017-01-28 18:36:51] Finished epoch 3 after 0.1388011 secs
INFO [2017-01-28 18:36:51] [RBM 4x10] Epoch 4 RMSE error: 0.0830296626661737
INFO [2017-01-28 18:36:51] Finished epoch 4 after 0.1130061 secs
INFO [2017-01-28 18:36:51] Classification error on Train set: 100% (5601/5601)
INFO [2017-01-28 18:36:51] Train set RMSE: 1.344
INFO [2017-01-28 18:36:51] Starting the training of the rbm with 10 visible and 1 hidden units.
INFO [2017-01-28 18:36:51] [RBM 10x1] Epoch 1 RMSE error: 0.00551611469749816
INFO [2017-01-28 18:36:51] Finished epoch 1 after 0.105006 secs
INFO [2017-01-28 18:36:52] [RBM 10x1] Epoch 2 RMSE error: 0.000434584963198201
INFO [2017-01-28 18:36:52] Finished epoch 2 after 0.1258001 secs
INFO [2017-01-28 18:36:52] [RBM 10x1] Epoch 3 RMSE error: 0.000347141463655062
INFO [2017-01-28 18:36:52] Finished epoch 3 after 0.1101999 secs
INFO [2017-01-28 18:36:52] [RBM 10x1] Epoch 4 RMSE error: 0.000291437799738362
INFO [2017-01-28 18:36:52] Finished epoch 4 after 0.1102011 secs
INFO [2017-01-28 18:36:52] Classification error on Train set: 100% (5601/5601)
INFO [2017-01-28 18:36:52] Train set RMSE: 1.414
INFO [2017-01-28 18:36:52] Pre-training finished after 1.05222 secs
INFO [2017-01-28 18:36:52] Training set consists of 5601 samples.
INFO [2017-01-28 18:36:52] Start deep architecture fine-tuning for 5 epochs
INFO [2017-01-28 18:36:52] Number of Batches: 5601 (batch size 1)
INFO [2017-01-28 18:36:52] Epoch: 1 of 5
INFO [2017-01-28 18:36:55] Classification error on Train set: 100% (5601/5601)
INFO [2017-01-28 18:36:55] Train set RMSE: 1.000
INFO [2017-01-28 18:36:55] Finished epoch 1 of 5 after 3.06 secs (1830 patterns/sec)
INFO [2017-01-28 18:36:55] Epoch: 2 of 5
INFO [2017-01-28 18:36:58] Classification error on Train set: 100% (5601/5601)
INFO [2017-01-28 18:36:58] Train set RMSE: 1.000
INFO [2017-01-28 18:36:58] Finished epoch 2 of 5 after 3.08 secs (1817 patterns/sec)
INFO [2017-01-28 18:36:58] Epoch: 3 of 5
INFO [2017-01-28 18:37:01] Classification error on Train set: 100% (5601/5601)
INFO [2017-01-28 18:37:01] Train set RMSE: 1.000
INFO [2017-01-28 18:37:01] Finished epoch 3 of 5 after 3.13 secs (1800 patterns/sec)
INFO [2017-01-28 18:37:01] Epoch: 4 of 5
INFO [2017-01-28 18:37:05] Classification error on Train set: 100% (5601/5601)
INFO [2017-01-28 18:37:05] Train set RMSE: 1.000
INFO [2017-01-28 18:37:05] Finished epoch 4 of 5 after 3.11 secs (1802 patterns/sec)
INFO [2017-01-28 18:37:05] Epoch: 5 of 5
INFO [2017-01-28 18:37:08] Classification error on Train set: 100% (5601/5601)
INFO [2017-01-28 18:37:08] Train set RMSE: 1.000
INFO [2017-01-28 18:37:08] Finished epoch 5 of 5 after 3.08 secs (1821 patterns/sec)
INFO [2017-01-28 18:37:08] Classification error on Train set (best model): 100% (5601/5601)
INFO [2017-01-28 18:37:08] Train set (best model) RMSE: 1.000
INFO [2017-01-28 18:37:08] Best model was found after epoch 5
INFO [2017-01-28 18:37:08] Fine-tuning finished after 15.8 secs
Warning messages:
1: In [<-
(*tmp*
, i, value = <S4 object of class "RBM">) :
implicit list embedding of S4 objects is deprecated
2: In [<-
(*tmp*
, i, value = <S4 object of class "RBM">) :
implicit list embedding of S4 objects is deprecated
model prediction
yhat_darch <- predict(darchmodel, newdata = test_darch, type = "class") #type = "prob"[,2] also possible!?
evaluation
numIncorrect <- sum(yhat_darch != test_darch[,17]) cat(paste0("Incorrect classifications on all examples: ", numIncorrect, " (",
- round(numIncorrect/nrow(test_darch)*100, 2), "%)\n")) Incorrect classifications on all examples: 1399 (100%)
cheers
There seems to be a problem with your dataset (may be a darch bug as well, I'm not sure), it does not recognize the factors (normally it should convert factors to several columns with only 0 and 1 values) and converts them to numeric, which doesn't give good results. I'm not sure how to fix this, I'm not an expert on R data structures. 100% incorrect classification results makes no sense for binary classification as that's the same as having everything correct, so there's a problem with the output of the network. Which dataset did you use in the very beginning? Where it went down to 18% error? Because there it does not show the conversion to numeric.
I will look into the darch code next week and try to reproduce the problem. In the meantime, here are some suggestions for your parameters:
rbm.batchSize
and rbm.numEpochs
, I don't think pre-training helps you here.darch.batchSize = 50
insteadbp.learnRate = .1
Ah, I missed
preProc.factorToNumeric = T,
preProc.factorToNumeric.targets = T,
please set those to F
and preProc.fullRank.targets = T
.
hey thanks for the help, sorry for my late response just now... im back to the 18% prediction by now, but I cant really figure why. however i know 18% is the ratio of returning customers, so Im pretty sure the model predicts everyone as a returning customer. If u have another idea feel free to let me know, if not never mind and thanks a lot for trying! :+1:
here´s my output:
darchmodel <- darch(return_customer~ newsletter + success_rate + goods_value +already_existing_account , data = train_darch, #subject to change!
Pre-processing:
INFO [2017-02-01 11:06:28] Converting non-numeric columns in targets (if any)...
INFO [2017-02-01 11:06:28] Dependent factor "return_customer" converted to 2 new variables (1-of-n coding)
INFO [2017-02-01 11:06:28] The current log level is: INFO
INFO [2017-02-01 11:06:28] Using CPU matrix multiplication.
WARN [2017-02-01 11:06:28] No vector given for "layers" parameter, constructing shallow network with one hidden layer of 10 neurons.
INFO [2017-02-01 11:06:28] Creating and configuring new DArch instance
INFO [2017-02-01 11:06:28] Constructing a network with 3 layers (4, 10, 1 neurons).
INFO [2017-02-01 11:06:28] Generating RBMs.
INFO [2017-02-01 11:06:28] Constructing new RBM instance with 4 visible and 10 hidden units.
INFO [2017-02-01 11:06:28] Constructing new RBM instance with 10 visible and 1 hidden units.
INFO [2017-02-01 11:06:28] DArch instance ready for training, here is a summary of its configuration:
INFO [2017-02-01 11:06:28] Global parameters:
INFO [2017-02-01 11:06:28] Layers parameter was 10, resulted in network with 3 layers and 4, 10, 1 neurons
INFO [2017-02-01 11:06:28] The weights for the layers were generated with "generateWeightsGlorotUniform"
INFO [2017-02-01 11:06:28] Additionally, the following parameters were used for weight generation:
INFO [2017-02-01 11:06:28] [weights] Parameter weights.max is 0.1
INFO [2017-02-01 11:06:28] [weights] Parameter weights.min is -0.1
INFO [2017-02-01 11:06:28] [weights] Parameter weights.mean is 0
INFO [2017-02-01 11:06:28] [weights] Parameter weights.sd is 0.01
INFO [2017-02-01 11:06:28] Weight normalization is disabled
INFO [2017-02-01 11:06:28] Bootstrapping is disabled
INFO [2017-02-01 11:06:28] Train data are shuffled before each epoch
INFO [2017-02-01 11:06:28] Autosaving is disabled
INFO [2017-02-01 11:06:28] Using CPU for matrix multiplication
INFO [2017-02-01 11:06:28] Pre-processing parameters:
INFO [2017-02-01 11:06:28] [preProc] Parameter preProc.factorToNumeric is FALSE
INFO [2017-02-01 11:06:28] [preProc] Parameter preProc.factorToNumeric.targets is FALSE
INFO [2017-02-01 11:06:28] [preProc] Parameter preProc.fullRank is TRUE
INFO [2017-02-01 11:06:28] [preProc] Parameter preProc.fullRank.targets is TRUE
INFO [2017-02-01 11:06:28] [preProc] Parameter preProc.orderedToFactor.targets is TRUE
INFO [2017-02-01 11:06:28] [preProc] Parameter preProc.targets is TRUE
INFO [2017-02-01 11:06:28] Caret pre-processing is disabled
INFO [2017-02-01 11:06:28] Pre-training parameters:
INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.allData is FALSE
INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.batchSize is 10
INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.consecutive is TRUE
INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.errorFunction is "rmseError"
INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.finalMomentum is 0.9
INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.initialMomentum is 0.5
INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.lastLayer is 0
INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.learnRate is 1
INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.learnRateScale is 1
INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.momentumRampLength is 1
INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.numCD is 1
INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.numEpochs is 4
INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.unitFunction is "sigmoidUnitRbm"
INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.updateFunction is "rbmUpdate"
INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.weightDecay is 2e-04
INFO [2017-02-01 11:06:28] The selected RBMs have been trained for 0 epochs
INFO [2017-02-01 11:06:28] Fine-tuning parameters:
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.batchSize is 1
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.dither is FALSE
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.dropout is 0
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.dropout.dropConnect is FALSE
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.dropout.momentMatching is 0
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.dropout.oneMaskPerEpoch is FALSE
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.elu.alpha is 1
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.errorFunction is "rmseError"
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.finalMomentum is 0.9
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.fineTuneFunction is "rpropagation"
INFO [2017-02-01 11:06:28] [RPROP] Using rpropagation for fine-tuning
INFO [2017-02-01 11:06:28] [RPROP] Parameter rprop.method is "iRprop+"
INFO [2017-02-01 11:06:28] [RPROP] Parameter rprop.decFact is 0.5
INFO [2017-02-01 11:06:28] [RPROP] Parameter rprop.incFact is 1.2
INFO [2017-02-01 11:06:28] [RPROP] Parameter rprop.initDelta is 0.0125
INFO [2017-02-01 11:06:28] [RPROP] Parameter rprop.minDelta is 1e-06
INFO [2017-02-01 11:06:28] [RPROP] Parameter rprop.maxDelta is 50
INFO [2017-02-01 11:06:28] [RPROP] See ?rpropagation for documentation
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.initialMomentum is 0.9
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.isClass is TRUE
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.maxout.poolSize is 2
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.maxout.unitFunction is "linearUnit"
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.momentumRampLength is 1
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.nesterovMomentum is TRUE
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.numEpochs is 5
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.returnBestModel is TRUE
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.returnBestModel.validationErrorFactor is 0.632120558828558
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.stopClassErr is -Inf
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.stopErr is -Inf
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.stopValidClassErr is -Inf
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.stopValidErr is -Inf
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.trainLayers is c(TRUE, TRUE)
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.unitFunction is "sigmoidUnit"
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.weightDecay is 0
INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.weightUpdateFunction is "weightDecayWeightUpdate"
INFO [2017-02-01 11:06:28] The network has been fine-tuned for 0 epochs
INFO [2017-02-01 11:06:28] Starting pre-training for 4 epochs
INFO [2017-02-01 11:06:28] Training set consists of 5601 samples
INFO [2017-02-01 11:06:28] The first 2 RBMs are going to be trained
INFO [2017-02-01 11:06:28] Starting the training of the rbm with 4 visible and 10 hidden units.
INFO [2017-02-01 11:06:28] [RBM 4x10] Epoch 1 RMSE error: 0.323003204574307
INFO [2017-02-01 11:06:28] Finished epoch 1 after 0.4614081 secs
INFO [2017-02-01 11:06:29] [RBM 4x10] Epoch 2 RMSE error: 0.319917878891257
INFO [2017-02-01 11:06:29] Finished epoch 2 after 0.4004111 secs
INFO [2017-02-01 11:06:29] [RBM 4x10] Epoch 3 RMSE error: 0.317499747165885
INFO [2017-02-01 11:06:29] Finished epoch 3 after 0.5437009 secs
INFO [2017-02-01 11:06:30] [RBM 4x10] Epoch 4 RMSE error: 0.318977340216636
INFO [2017-02-01 11:06:30] Finished epoch 4 after 0.4203022 secs
INFO [2017-02-01 11:06:30] Classification error on Train set: 81.54% (4567/5601)
INFO [2017-02-01 11:06:30] Train set RMSE: 0.713
INFO [2017-02-01 11:06:30] Starting the training of the rbm with 10 visible and 1 hidden units.
INFO [2017-02-01 11:06:30] [RBM 10x1] Epoch 1 RMSE error: 0.00658042195367504
INFO [2017-02-01 11:06:30] Finished epoch 1 after 0.405803 secs
INFO [2017-02-01 11:06:31] [RBM 10x1] Epoch 2 RMSE error: 0.000693930106950838
INFO [2017-02-01 11:06:31] Finished epoch 2 after 0.4087031 secs
INFO [2017-02-01 11:06:31] [RBM 10x1] Epoch 3 RMSE error: 0.000539636404332261
INFO [2017-02-01 11:06:31] Finished epoch 3 after 0.392302 secs
INFO [2017-02-01 11:06:31] [RBM 10x1] Epoch 4 RMSE error: 0.000471148988971575
INFO [2017-02-01 11:06:31] Finished epoch 4 after 0.394202 secs
INFO [2017-02-01 11:06:31] Classification error on Train set: 81.54% (4567/5601)
INFO [2017-02-01 11:06:31] Train set RMSE: 0.903
INFO [2017-02-01 11:06:31] Pre-training finished after 3.505233 secs
INFO [2017-02-01 11:06:31] Training set consists of 5601 samples.
INFO [2017-02-01 11:06:31] Start deep architecture fine-tuning for 5 epochs
INFO [2017-02-01 11:06:32] Number of Batches: 5601 (batch size 1)
INFO [2017-02-01 11:06:32] Epoch: 1 of 5
INFO [2017-02-01 11:06:35] Classification error on Train set: 18.46% (1034/5601)
INFO [2017-02-01 11:06:35] Train set RMSE: 0.430
INFO [2017-02-01 11:06:35] Finished epoch 1 of 5 after 2.98 secs (1879 patterns/sec)
INFO [2017-02-01 11:06:35] Epoch: 2 of 5
INFO [2017-02-01 11:06:38] Classification error on Train set: 18.46% (1034/5601)
INFO [2017-02-01 11:06:38] Train set RMSE: 0.430
INFO [2017-02-01 11:06:38] Finished epoch 2 of 5 after 2.97 secs (1884 patterns/sec)
INFO [2017-02-01 11:06:38] Epoch: 3 of 5
INFO [2017-02-01 11:06:41] Classification error on Train set: 18.46% (1034/5601)
INFO [2017-02-01 11:06:41] Train set RMSE: 0.430
INFO [2017-02-01 11:06:41] Finished epoch 3 of 5 after 3.16 secs (1770 patterns/sec)
INFO [2017-02-01 11:06:41] Epoch: 4 of 5
INFO [2017-02-01 11:06:44] Classification error on Train set: 18.46% (1034/5601)
INFO [2017-02-01 11:06:44] Train set RMSE: 0.430
INFO [2017-02-01 11:06:44] Finished epoch 4 of 5 after 3.3 secs (1697 patterns/sec)
INFO [2017-02-01 11:06:44] Epoch: 5 of 5
INFO [2017-02-01 11:06:48] Classification error on Train set: 18.46% (1034/5601)
INFO [2017-02-01 11:06:48] Train set RMSE: 0.430
INFO [2017-02-01 11:06:48] Finished epoch 5 of 5 after 3.77 secs (1489 patterns/sec)
INFO [2017-02-01 11:06:48] Classification error on Train set (best model): 18.46% (1034/5601)
INFO [2017-02-01 11:06:48] Train set (best model) RMSE: 0.430
INFO [2017-02-01 11:06:48] Best model was found after epoch 5
INFO [2017-02-01 11:06:48] Fine-tuning finished after 16.35 secs
Warning messages:
1: In [<-
(*tmp*
, i, value = <S4 object of class "RBM">) :
implicit list embedding of S4 objects is deprecated
2: In [<-
(*tmp*
, i, value = <S4 object of class "RBM">) :
implicit list embedding of S4 objects is deprecated
I understand this less and less, it still creates a network with 4 input neurons whereas it should be 6 (since two input variables were converted to two new variables), same for the output, it should have 2 variables. Maybe it's a problem with the caret pre-processing which automatically merges binary variables into one column... Hm, alright, let's try some more:
darchmodel <- darch(return_customer~ newsletter + success_rate + goods_value +already_existing_account , data = train_darch,
preProc.params = list(method = c("center", "scale")),
darch.numEpochs = 5,
layers = c(0, 50, 10, 0),
bp.learnRate = 0.1,
darch.batchSize = 10,
retainData = T
)
Now, after training, please run
predict(darchmodel, outputlayer = 1) # returns the values after darch pre-processing
predict(darchmodel) # returns the raw network output
For the first call, please just paste a couple of the rows here or check if anything is strange about the data (not normalized etc.), for the second call I presume that all output values are the same (either 0 or 1), please check which one it is and maybe paste a couple of the rows here.
the output is 1 in all the returning customer cases. so approx 18%
output of the prediction (last lines): [498,] 0.8158758 0.1851527 [499,] 0.8168210 0.1820696 [500,] 0.8178002 0.1835608 my guess would be the prediction is not weighted enough and only has slightly different values, by this it just sticks to the most likely option which is none returning...the data set is imbalanced 82:18%...does this matter?
I guess the imbalance is the problem here. It just ignores the 18% completely. The simplest thing would be replicating samples until you have about 50/50, or just selecting all returning customer cases and about as many of the non-returning customer cases and use that for the training set, and all data for validation.
i have like a 0.45:0.55 set now but its still not predicting.. training set was 33% error and afterwards 66% output:
NFO [2017-02-03 00:31:58] Result of preProcess for targets: Created from 4802 samples and 1 variables
Pre-processing:
INFO [2017-02-03 00:31:58] Converting non-numeric columns in targets (if
any)...
INFO [2017-02-03 00:31:58] Dependent factor "return_customer" converted to
2 new variables (1-of-n coding)
INFO [2017-02-03 00:31:58] The current log level is: INFO
INFO [2017-02-03 00:31:58] Using CPU matrix multiplication.
WARN [2017-02-03 00:31:58] No vector given for "layers" parameter,
constructing shallow network with one hidden layer of 10 neurons.
....INFO [2017-02-03 00:31:58] Creating and configuring new DArch instance
INFO [2017-02-03 00:31:58] Constructing a network with 3 layers (13, 10, 1
neurons).
INFO [2017-02-03 00:31:58] Generating RBMs.
.....
INFO [2017-02-03 00:31:58] The network has been fine-tuned for 0 epochs
INFO [2017-02-03 00:31:58] Starting pre-training for 6 epochs
INFO [2017-02-03 00:31:58] Training set consists of 4802 samples
INFO [2017-02-03 00:31:58] The first 2 RBMs are going to be trained
INFO [2017-02-03 00:31:58] Starting the training of the rbm with 13 visible
and 10 hidden units.
INFO [2017-02-03 00:31:59] [RBM 13x10] Epoch 1 RMSE error: 0.50781337462991
INFO [2017-02-03 00:31:59] Finished epoch 1 after 0.4488058 secs
INFO [2017-02-03 00:31:59] [RBM 13x10] Epoch 2 RMSE error: 0.504939270606509
INFO [2017-02-03 00:31:59] Finished epoch 2 after 0.3986089 secs
......
INFO [2017-02-03 00:32:18] Train set RMSE: 0.816
INFO [2017-02-03 00:32:18] Finished epoch 5 of 5 after 3.02 secs (1601
patterns/sec)
INFO [2017-02-03 00:32:18] Classification error on Train set (best model):
66.66% (3201/4802)
INFO [2017-02-03 00:32:18] Train set (best model) RMSE: 0.816
INFO [2017-02-03 00:32:18] Best model was found after epoch 5
INFO [2017-02-03 00:32:18] Fine-tuning finished after 14.86 secs
Warning messages:
1: In [<-
(*tmp*
, i, value = <S4 object of class "RBM">) :
implicit list embedding of S4 objects is deprecated
2: In [<-
(*tmp*
, i, value = <S4 object of class "RBM">) :
implicit list embedding of S4 objects is deprecated
2017-02-01 16:46 GMT+01:00 saviola777 notifications@github.com:
I guess the imbalance is the problem here. It just ignores the 18% completely. The simplest thing would be replicating samples until you have about 50/50, or just selecting all returning customer cases and about as many of the non-returning customer cases and use that for the training set, and all data for validation.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/maddin79/darch/issues/24#issuecomment-276692502, or mute the thread https://github.com/notifications/unsubscribe-auth/AWkPJgCXFUSvkc5irFAWvAd8k_sCfcJVks5rYKjGgaJpZM4LwA4z .
Please try it with the parameters from my comment, and try to increase the number of fine-tuning epochs – expecting convergence after 5 epochs is not realistic.
thanks a lot. its running now! :) doesnt get much better that a classification error of around 30%, even with 300 epochs...are improvements possible you reckon? also i would like to compare it not only to wrong and write, there are several classification errors possible (from a matrix with true positive, true negative false positive and false negative) and they result in different values for the errors...is this possible to implement?
cheers and thanks soo much for your help already :)
2017-02-03 11:54 GMT+01:00 saviola777 notifications@github.com:
Please try it with the parameters from my comment https://github.com/maddin79/darch/issues/24#issuecomment-276625785, and try to increase the number of fine-tuning epochs – expecting convergence after 5 epochs is not realistic.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/maddin79/darch/issues/24#issuecomment-277219853, or mute the thread https://github.com/notifications/unsubscribe-auth/AWkPJjCpPSucWYeQToZOQowZmTAwXVb3ks5rYwdrgaJpZM4LwA4z .
As for the classification error, it depends on the training/validation error. As long as this is pretty equal, you should be able to improve the overall result by increasing the network size / tuning parameters. At some point you'll run into over-fitting, where your training error will improve, but your validation error will get worse.
As for the TP/TN/FP/FN matrix, I'm not sure how to best integrate it directly into the training (caret offers this matrix in its output, but it's not used for training). The easiest thing would probably be a conversion of the data set, maybe there are standards for this. Do you know of any implementations where I could look at an approach for solving this?
Closing due to inactivity, feel free to re-open / continue the discussion if needed.
Dear martin, i am trying to predict if the outcome a a variable is 0 or 1. However, trying to implement the darch package and it sets every observation to a 1. regardless of its input variables. The dataset is normalized and huge with 40.000 observations by over 30 variables. is it even possible and useful to use the darch prediction?-or is it just a parameter I have to adjust differently? best regards, here my coding:
model training
darchmodel <- darch(return_customer~ newsletter + success_rate + goods_value +already_existing_account , data = train_darch, bootstrap = T, darch.numEpochs = 30, darch.batchSize = 2, normalizeWeights = T, darch.errorFunction = rmseError, darch.stopValidClassErr = 0.15, darch.returnBestModel.validationErrorFactor = 1)
model prediction
yhat_darch <- predict(darchmodel, newdata = test_darch, type = "class")
evaluation
numIncorrect <- sum(yhat_darch != test_darch[,17]) cat(paste0("Incorrect classifications on all examples: ", numIncorrect, " (", round(numIncorrect/nrow(test_darch)*100, 2), "%)\n"))
also tried it with type="bin" , normalizeWeights = T darch.errorFunction = rmseError, darch.stopValidClassErr = 0.15, darch.returnBestModel.validationErrorFactor = 1.
Im clueless. cheers