maddin79 / darch

Create deep architectures in the R programming language
GNU General Public License v3.0
71 stars 31 forks source link

classification error #24

Closed richigneven closed 7 years ago

richigneven commented 7 years ago

Dear martin, i am trying to predict if the outcome a a variable is 0 or 1. However, trying to implement the darch package and it sets every observation to a 1. regardless of its input variables. The dataset is normalized and huge with 40.000 observations by over 30 variables. is it even possible and useful to use the darch prediction?-or is it just a parameter I have to adjust differently? best regards, here my coding:

model training

darchmodel <- darch(return_customer~ newsletter + success_rate + goods_value +already_existing_account , data = train_darch, bootstrap = T, darch.numEpochs = 30, darch.batchSize = 2, normalizeWeights = T, darch.errorFunction = rmseError, darch.stopValidClassErr = 0.15, darch.returnBestModel.validationErrorFactor = 1)

model prediction

yhat_darch <- predict(darchmodel, newdata = test_darch, type = "class")

evaluation

numIncorrect <- sum(yhat_darch != test_darch[,17]) cat(paste0("Incorrect classifications on all examples: ", numIncorrect, " (", round(numIncorrect/nrow(test_darch)*100, 2), "%)\n"))

also tried it with type="bin" , normalizeWeights = T darch.errorFunction = rmseError, darch.stopValidClassErr = 0.15, darch.returnBestModel.validationErrorFactor = 1.

Im clueless. cheers

saviola777 commented 7 years ago

Hello,

have your tried disabling weight normalization? And have you tried playing around with the network structure, or the activation function? It's hard to say without further information, but most often it's a problem with the parameters.

Please paste the beginning of the log output for the training process (where the parameters are listed etc.).

richigneven commented 7 years ago

Thanks a lot for this quick respons..

Ive tried to disable the weight normalization and dont really no which paramters how to address.. here is a beginning of the output:

model training

darchmodel <- darch(return_customer~ newsletter + success_rate + goods_value +already_existing_account , data = train_darch, bootstrap = T, darch.numEpochs =

  • 30, darch.batchSize = 2,
  • normalizeWeights = F,
  • darch.errorFunction = rmseError,
  • darch.stopValidClassErr = 0.15,
  • darch.returnBestModel.validationErrorFactor = 1) INFO [2017-01-28 02:07:23] The current log level is: INFO INFO [2017-01-28 02:07:23] Start initial caret pre-processing. INFO [2017-01-28 02:07:23] Converting non-numeric columns in data (if any)... INFO [2017-01-28 02:07:23] Converting non-numeric columns in targets (if any)... INFO [2017-01-28 02:07:23] The current log level is: INFO INFO [2017-01-28 02:07:23] Using CPU matrix multiplication. WARN [2017-01-28 02:07:23] No vector given for "layers" parameter, constructing shallow network with one hidden layer of 10 neurons. INFO [2017-01-28 02:07:23] Bootstrapping is started with 5600 samples, bootstrapping results in 3550 training (3550 unique) and 2050 validation samples for this run. INFO [2017-01-28 02:07:23] Creating and configuring new DArch instance INFO [2017-01-28 02:07:23] Constructing a network with 3 layers (4, 10, 1 neurons). INFO [2017-01-28 02:07:23] Generating RBMs. INFO [2017-01-28 02:07:23] Constructing new RBM instance with 4 visible and 10 hidden units. INFO [2017-01-28 02:07:23] Constructing new RBM instance with 10 visible and 1 hidden units. INFO [2017-01-28 02:07:23] DArch instance ready for training, here is a summary of its configuration: INFO [2017-01-28 02:07:23] Global parameters: INFO [2017-01-28 02:07:23] Layers parameter was 10, resulted in network with 3 layers and 4, 10, 1 neurons INFO [2017-01-28 02:07:23] The weights for the layers were generated with "generateWeightsGlorotUniform" INFO [2017-01-28 02:07:23] Additionally, the following parameters were used for weight generation: INFO [2017-01-28 02:07:23] [weights] Parameter weights.max is 0.1 INFO [2017-01-28 02:07:23] [weights] Parameter weights.min is -0.1 INFO [2017-01-28 02:07:23] [weights] Parameter weights.mean is 0 INFO [2017-01-28 02:07:23] [weights] Parameter weights.sd is 0.01 INFO [2017-01-28 02:07:23] Weight normalization is disabled INFO [2017-01-28 02:07:23] Bootstrapping is enabled with the following parameters: INFO [2017-01-28 02:07:23] [bootstrap] Parameter bootstrap.unique is TRUE INFO [2017-01-28 02:07:23] [bootstrap] Parameter bootstrap.num is 0 INFO [2017-01-28 02:07:23] Train data are shuffled before each epoch INFO [2017-01-28 02:07:23] Autosaving is disabled INFO [2017-01-28 02:07:23] Using CPU for matrix multiplication INFO [2017-01-28 02:07:23] Pre-processing parameters: INFO [2017-01-28 02:07:23] [preProc] Parameter preProc.factorToNumeric is FALSE INFO [2017-01-28 02:07:23] [preProc] Parameter preProc.factorToNumeric.targets is FALSE INFO [2017-01-28 02:07:23] [preProc] Parameter preProc.fullRank is TRUE INFO [2017-01-28 02:07:23] [preProc] Parameter preProc.fullRank.targets is FALSE INFO [2017-01-28 02:07:23] [preProc] Parameter preProc.orderedToFactor.targets is TRUE INFO [2017-01-28 02:07:23] [preProc] Parameter preProc.targets is FALSE INFO [2017-01-28 02:07:23] Caret pre-processing is disabled INFO [2017-01-28 02:07:23] Pre-training parameters: INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.allData is FALSE INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.batchSize is 1 INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.consecutive is TRUE INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.errorFunction is "mseError" INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.finalMomentum is 0.9 INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.initialMomentum is 0.5 INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.lastLayer is 0 INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.learnRate is 1 INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.learnRateScale is 1 INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.momentumRampLength is 1 INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.numCD is 1 INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.numEpochs is 0 INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.unitFunction is "sigmoidUnitRbm" INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.updateFunction is "rbmUpdate" INFO [2017-01-28 02:07:23] [preTrain] Parameter rbm.weightDecay is 2e-04 INFO [2017-01-28 02:07:23] The selected RBMs have been trained for 0 epochs INFO [2017-01-28 02:07:23] Fine-tuning parameters: INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.batchSize is 2 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.dither is FALSE INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.dropout is 0 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.dropout.dropConnect is FALSE INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.dropout.momentMatching is 0 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.dropout.oneMaskPerEpoch is FALSE INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.elu.alpha is 1 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.errorFunction is "rmseError" INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.finalMomentum is 0.9 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.fineTuneFunction is "backpropagation" INFO [2017-01-28 02:07:24] [backprop] Using backpropagation for fine-tuning INFO [2017-01-28 02:07:24] [backprop] Parameter bp.learnRate is c(1, 1) INFO [2017-01-28 02:07:24] [backprop] Parameter bp.learnRateScale is 1 INFO [2017-01-28 02:07:24] [backprop] See ?backpropagation for documentation INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.initialMomentum is 0.5 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.isClass is TRUE INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.maxout.poolSize is 2 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.maxout.unitFunction is "linearUnit" INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.momentumRampLength is 1 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.nesterovMomentum is TRUE INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.numEpochs is 30 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.returnBestModel is TRUE INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.returnBestModel.validationErrorFactor is 1 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.stopClassErr is -Inf INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.stopErr is -Inf INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.stopValidClassErr is 0.15 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.stopValidErr is -Inf INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.trainLayers is c(TRUE, TRUE) INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.unitFunction is "sigmoidUnit" INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.weightDecay is 0 INFO [2017-01-28 02:07:24] [fineTune] Parameter darch.weightUpdateFunction is "weightDecayWeightUpdate" INFO [2017-01-28 02:07:24] The network has been fine-tuned for 0 epochs INFO [2017-01-28 02:07:24] Training set consists of 3550 samples. INFO [2017-01-28 02:07:24] Validation set consists of 2050 samples INFO [2017-01-28 02:07:24] Start deep architecture fine-tuning for 30 epochs INFO [2017-01-28 02:07:24] Number of Batches: 1775 (batch size 2) INFO [2017-01-28 02:07:24] Epoch: 1 of 30 INFO [2017-01-28 02:07:25] Classification error on Train set: 18.82% (668/3550) INFO [2017-01-28 02:07:25] Train set RMSE: 0.434 INFO [2017-01-28 02:07:25] Classification error on Validation set: 17.95% (368/2050) INFO [2017-01-28 02:07:25] Validation set RMSE: 0.424 INFO [2017-01-28 02:07:25] Finished epoch 1 of 30 after 0.927 secs (3828 patterns/sec) INFO [2017-01-28 02:07:25] Epoch: 2 of 30 INFO [2017-01-28 02:07:26] Classification error on Train set: 18.82% (668/3550) INFO [2017-01-28 02:07:26] Train set RMSE: 0.434 INFO [2017-01-28 02:07:26] Classification error on Validation set: 17.95% (368/2050) INFO [2017-01-28 02:07:26] Validation set RMSE: 0.424 INFO [2017-01-28 02:07:26] Finished epoch 2 of 30 after 0.903 secs (3930 patterns/sec) INFO [2017-01-28 02:07:26] Epoch: 3 of 30 INFO [2017-01-28 02:07:26] Classification error on Train set: 18.82% (668/3550) INFO [2017-01-28 02:07:26] Train set RMSE: 0.434 INFO [2017-01-28 02:07:26] Classification error on Validation set: 17.95% (368/2050) INFO [2017-01-28 02:07:27] Validation set RMSE: 0.424 INFO [2017-01-28 02:07:27] Finished epoch 3 of 30 after 0.9 secs (3943 patterns/sec) INFO [2017-01-28 02:07:27] Epoch: 4 of 30 INFO [2017-01-28 02:07:27] Classification error on Train set: 18.82% (668/3550) INFO [2017-01-28 02:07:27] Train set RMSE: 0.434 INFO [2017-01-28 02:07:27] Classification error on Validation set: 17.95% (368/2050) INFO [2017-01-28 02:07:27] Validation set RMSE: 0.424 INFO [2017-01-28 02:07:27] Finished epoch 4 of 30 after 0.904 secs (3926 patterns/sec)

when using normalized data (except identification variable and wanted output return) the algorithm need like 10 seconds per epoch...but still same result..:( im appreciating any help..cheers, richi

saviola777 commented 7 years ago

What does the output variable look like? You have to consider that you are using the sigmoid unit function for the output. A classification with just one output variable is usually problematic, try to convert your output variable to a factor, so that it results in two output neurons (1 0 for the first class and 0 1 for the second class) and use softmax on the output layer. You might also want to consider playing around with the preProc.* parameters (e.g., preProc.targets = T), as well as increase the batch size.

Sorry that I can't give you a definite solution, maybe if you give me more information about the dataset (value ranges etc.), I could generate a similar dataset and test with it.

richigneven commented 7 years ago

Thanks again. I have converted in into two now with 1 beeing 1 and 1 and 0 beeing 0 and 0. That doesnt make a difference does it? Im back to the 100% classification error now.. Here the structure of the dataset:

data.frame': 64789 obs. of 37 variables: $ ID : int 1 2 3 4 5 6 7 8 9 10 ... $ title : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ newsletter : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 2 ... $ delivery : Factor w/ 2 levels "option 0 ","option 1": 1 2 1 1 2 1 2 2 1 1 ... $ coupon : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 1 1 2 1 ... $ advertising_code : Factor w/ 2 levels "no","yes": 1 1 1 1 2 2 1 1 1 1 ... $ goods_value : int 2 2 1 3 4 3 4 4 4 4 ... $ giftwrapping : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ referrer : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 2 2 1 1 ... $ cost_shipping : Factor w/ 2 levels "no","yes": 1 1 2 1 1 2 1 1 1 1 ... $ weight : num 0.737 0.368 0.047 0 0.843 ... $ return_customer : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ weight_classes : num 3 2 1 1 3 4 2 1 4 4 ... $ already_existing_account: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ... $ success_rate : num 1 0 1 1 1 1 1 1 1 1 ... $ lor_classes : Factor w/ 5 levels "Instant purchase",..: 1 1 1 1 1 1 1 1 2 1 ... $ waiting_time_class : Factor w/ 6 levels "canceled by customer",..: 3 6 4 2 2 3 3 2 3 3 ... $ book : Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 2 2 1 ... $ paperback : Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 1 1 1 1 ... $ schoolbook : Factor w/ 2 levels "no","yes": 1 1 1 1 2 1 1 1 1 1 ... $ ebook : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ audiobook : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ audiobook_download : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ... $ film : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ musical : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ hardware : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ imported : Factor w/ 2 levels "no","yes": 1 1 2 1 1 1 1 1 1 2 ... $ other : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ used : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 1 1 1 ... $ woe.postcode_invoice : num 0.0721 -0.2512 0.0943 -0.0814 -0.1449 ... $ woe.form_of_address : num 0.0496 0.0853 0.0496 -0.4913 0.0496 ... $ woe.payment : num 0.0708 -0.2935 0.093 0.093 -0.2935 ... $ woe.size_of_order : num -0.0852 0.5821 0.1384 -0.0852 -0.0852 ... $ woe.month_of_order : num 0.1951 0.1951 0.0266 0.0631 0.0266 ... $ woe.model : num 0.00296 0.10121 0.10121 0.10121 0.10121 ... $ woe.email_domain : num 0.006664 0.026065 0.000671 -0.108222 -0.064984 ... $ woe.weekday : num -0.00303 0.00069 0.00069 -0.00303 0.03798 ...

*for the darch training im using a smaller dataset which also includes the variable $return_customer2 which is the same as $return_customer.

but I have normalized the variables and for simplicity and I think woe wouldnt make sense for prediction and so fourth im using only newsletter, success_rate, goods_value and already_existing_acount. also these variables are known to be important through other predictions.

once again my last output..:

data$return_customer [1] no no no no no no no no no no no no no no no no no no no no no no no no no no ..... Levels: no yes

View(data) srt(data) Error: could not find function "srt" str(data) 'data.frame': 64789 obs. of 37 variables: $ ID : int 1 2 3 4 5 6 7 8 9 10 ... $ title : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ newsletter : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 2 ... $ delivery : Factor w/ 2 levels "option 0 ","option 1": 1 2 1 1 2 1 2 2 1 1 ... $ coupon : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 1 1 2 1 ... $ advertising_code : Factor w/ 2 levels "no","yes": 1 1 1 1 2 2 1 1 1 1 ... $ goods_value : int 2 2 1 3 4 3 4 4 4 4 ... $ giftwrapping : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ referrer : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 2 2 1 1 ... $ cost_shipping : Factor w/ 2 levels "no","yes": 1 1 2 1 1 2 1 1 1 1 ... $ weight : num 0.737 0.368 0.047 0 0.843 ... $ return_customer : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ weight_classes : num 3 2 1 1 3 4 2 1 4 4 ... $ already_existing_account: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ... $ success_rate : num 1 0 1 1 1 1 1 1 1 1 ... $ lor_classes : Factor w/ 5 levels "Instant purchase",..: 1 1 1 1 1 1 1 1 2 1 ... $ waiting_time_class : Factor w/ 6 levels "canceled by customer",..: 3 6 4 2 2 3 3 2 3 3 ... $ book : Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 2 2 1 ... $ paperback : Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 1 1 1 1 ... $ schoolbook : Factor w/ 2 levels "no","yes": 1 1 1 1 2 1 1 1 1 1 ... $ ebook : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ audiobook : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ audiobook_download : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ... $ film : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ musical : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ hardware : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ imported : Factor w/ 2 levels "no","yes": 1 1 2 1 1 1 1 1 1 2 ... $ other : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ used : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 1 1 1 ... $ woe.postcode_invoice : num 0.0721 -0.2512 0.0943 -0.0814 -0.1449 ... $ woe.form_of_address : num 0.0496 0.0853 0.0496 -0.4913 0.0496 ... $ woe.payment : num 0.0708 -0.2935 0.093 0.093 -0.2935 ... $ woe.size_of_order : num -0.0852 0.5821 0.1384 -0.0852 -0.0852 ... $ woe.month_of_order : num 0.1951 0.1951 0.0266 0.0631 0.0266 ... $ woe.model : num 0.00296 0.10121 0.10121 0.10121 0.10121 ... $ woe.email_domain : num 0.006664 0.026065 0.000671 -0.108222 -0.064984 ... $ woe.weekday : num -0.00303 0.00069 0.00069 -0.00303 0.03798 ... data_darch <- data_normal[c(7001:14000,55000:56575), ] data_normal[,c(2:11, 13:28)] <- lapply(data_normal[,c(2:11, 13:28)], scale) #normalizing without ID and return_customer data_darch <- data_normal[c(7001:14000,55000:56575), ] as.factor(data_normal$return_customer) .... Levels: 1 2

variable change

data$return_customer1 <-

data_darch$return_customer <- ifelse(data_darch$return_customer == 1, 0, ifelse(data_darch$return_customer == 2, 1, data_darch$return_customer))

subset <- data_darch$return_customer data_darch$return_customer2 <- subset

data pationing

actual data split

1)slit data set up into a) data set known and b) data with return customer not known

2)split knwon data up into train and test set

3) use the unknown data as a validation set

data_darch_known <- data_darch[!is.na(data_darch$return_customer),] idx.train2 <- createDataPartition(y = data_darch_known$return_customer, p = 0.8, list = FALSE) # Draw a random, stratified sample including p percent of the data train_darch <- data_darch_known[idx.train2, ] # training set test_darch <- data_darch_known[-idx.train2, ] # test set (drop all observations with train indeces) validation_darch <- data_darch[is.na(data_darch$return_customer),]

library(darch)

model training

darchmodel <- darch( cbind (return_customer, return_customer2)~ newsletter + success_rate + goods_value +already_existing_account , data = train_darch, bootstrap = T, darch.numEpochs =

  • 5, darch.batchSize = 10,
  • normalizeWeights = T,
  • darch.errorFunction = rmseError,
  • darch.stopValidClassErr = 0.25,
  • darch.returnBestModel.validationErrorFactor = 1,
  • preProc.params = T,
  • darch.unitFunction = c("sigmoidUnit", "softmaxUnit")) INFO [2017-01-28 14:06:12] The current log level is: INFO INFO [2017-01-28 14:06:12] Start initial caret pre-processing. INFO [2017-01-28 14:06:12] Converting non-numeric columns in data (if any)... INFO [2017-01-28 14:06:12] Converting non-numeric columns in targets (if any)... INFO [2017-01-28 14:06:12] The current log level is: INFO INFO [2017-01-28 14:06:12] Using CPU matrix multiplication. WARN [2017-01-28 14:06:12] No vector given for "layers" parameter, constructing shallow network with one hidden layer of 10 neurons. INFO [2017-01-28 14:06:12] Bootstrapping is started with 5600 samples, bootstrapping results in 3507 training (3507 unique) and 2093 validation samples for this run. INFO [2017-01-28 14:06:12] Creating and configuring new DArch instance INFO [2017-01-28 14:06:12] Constructing a network with 3 layers (4, 10, 2 neurons). INFO [2017-01-28 14:06:12] Generating RBMs. INFO [2017-01-28 14:06:12] Constructing new RBM instance with 4 visible and 10 hidden units. INFO [2017-01-28 14:06:12] Constructing new RBM instance with 10 visible and 2 hidden units. INFO [2017-01-28 14:06:12] DArch instance ready for training, here is a summary of its configuration: INFO [2017-01-28 14:06:12] Global parameters: INFO [2017-01-28 14:06:12] Layers parameter was 10, resulted in network with 3 layers and 4, 10, 2 neurons INFO [2017-01-28 14:06:12] The weights for the layers were generated with "generateWeightsGlorotUniform" INFO [2017-01-28 14:06:12] Additionally, the following parameters were used for weight generation: INFO [2017-01-28 14:06:12] [weights] Parameter weights.max is 0.1 INFO [2017-01-28 14:06:12] [weights] Parameter weights.min is -0.1 INFO [2017-01-28 14:06:12] [weights] Parameter weights.mean is 0 INFO [2017-01-28 14:06:12] [weights] Parameter weights.sd is 0.01 INFO [2017-01-28 14:06:12] Weight normalization is enabled using a maxnorm bound of 15 INFO [2017-01-28 14:06:12] Bootstrapping is enabled with the following parameters: INFO [2017-01-28 14:06:12] [bootstrap] Parameter bootstrap.unique is TRUE INFO [2017-01-28 14:06:12] [bootstrap] Parameter bootstrap.num is 0 INFO [2017-01-28 14:06:12] Train data are shuffled before each epoch INFO [2017-01-28 14:06:12] Autosaving is disabled INFO [2017-01-28 14:06:12] Using CPU for matrix multiplication INFO [2017-01-28 14:06:12] Pre-processing parameters: INFO [2017-01-28 14:06:12] [preProc] Parameter preProc.factorToNumeric is FALSE INFO [2017-01-28 14:06:12] [preProc] Parameter preProc.factorToNumeric.targets is FALSE INFO [2017-01-28 14:06:12] [preProc] Parameter preProc.fullRank is TRUE INFO [2017-01-28 14:06:12] [preProc] Parameter preProc.fullRank.targets is FALSE INFO [2017-01-28 14:06:12] [preProc] Parameter preProc.orderedToFactor.targets is TRUE INFO [2017-01-28 14:06:12] [preProc] Parameter preProc.targets is FALSE INFO [2017-01-28 14:06:12] Caret pre-processing is disabled INFO [2017-01-28 14:06:12] Pre-training parameters: INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.allData is FALSE INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.batchSize is 1 INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.consecutive is TRUE INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.errorFunction is "mseError" INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.finalMomentum is 0.9 INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.initialMomentum is 0.5 INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.lastLayer is 0 INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.learnRate is 1 INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.learnRateScale is 1 INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.momentumRampLength is 1 INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.numCD is 1 INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.numEpochs is 0 INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.unitFunction is "sigmoidUnitRbm" INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.updateFunction is "rbmUpdate" INFO [2017-01-28 14:06:12] [preTrain] Parameter rbm.weightDecay is 2e-04 INFO [2017-01-28 14:06:12] The selected RBMs have been trained for 0 epochs INFO [2017-01-28 14:06:12] Fine-tuning parameters: INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.batchSize is 10 INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.dither is FALSE INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.dropout is 0 INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.dropout.dropConnect is FALSE INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.dropout.momentMatching is 0 INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.dropout.oneMaskPerEpoch is FALSE INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.elu.alpha is 1 INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.errorFunction is "rmseError" INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.finalMomentum is 0.9 INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.fineTuneFunction is "backpropagation" INFO [2017-01-28 14:06:12] [backprop] Using backpropagation for fine-tuning INFO [2017-01-28 14:06:12] [backprop] Parameter bp.learnRate is c(1, 1) INFO [2017-01-28 14:06:12] [backprop] Parameter bp.learnRateScale is 1 INFO [2017-01-28 14:06:12] [backprop] See ?backpropagation for documentation INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.initialMomentum is 0.5 INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.isClass is TRUE INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.maxout.poolSize is 2 INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.maxout.unitFunction is "linearUnit" INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.momentumRampLength is 1 INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.nesterovMomentum is TRUE INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.numEpochs is 5 INFO [2017-01-28 14:06:12] [fineTune] Parameter darch.returnBestModel is TRUE INFO [2017-01-28 14:06:13] [fineTune] Parameter darch.returnBestModel.validationErrorFactor is 1 INFO [2017-01-28 14:06:13] [fineTune] Parameter darch.stopClassErr is -Inf INFO [2017-01-28 14:06:13] [fineTune] Parameter darch.stopErr is -Inf INFO [2017-01-28 14:06:13] [fineTune] Parameter darch.stopValidClassErr is 0.25 INFO [2017-01-28 14:06:13] [fineTune] Parameter darch.stopValidErr is -Inf INFO [2017-01-28 14:06:13] [fineTune] Parameter darch.trainLayers is c(TRUE, TRUE) INFO [2017-01-28 14:06:13] [fineTune] Parameter darch.unitFunction is c("sigmoidUnit", "softmaxUnit") INFO [2017-01-28 14:06:13] [fineTune] Parameter darch.weightDecay is 0 INFO [2017-01-28 14:06:13] [fineTune] Parameter darch.weightUpdateFunction is "weightDecayWeightUpdate" INFO [2017-01-28 14:06:13] The network has been fine-tuned for 0 epochs INFO [2017-01-28 14:06:13] Training set consists of 3507 samples. INFO [2017-01-28 14:06:13] Validation set consists of 2093 samples INFO [2017-01-28 14:06:13] Start deep architecture fine-tuning for 5 epochs INFO [2017-01-28 14:06:13] Number of Batches: 351 (batch size 10) INFO [2017-01-28 14:06:13] Epoch: 1 of 5 INFO [2017-01-28 14:06:13] Classification error on Train set: 100% (3507/3507) INFO [2017-01-28 14:06:13] Train set RMSE: 1.000 INFO [2017-01-28 14:06:13] Classification error on Validation set: 100% (2093/2093) INFO [2017-01-28 14:06:13] Validation set RMSE: 1.000 INFO [2017-01-28 14:06:13] Finished epoch 1 of 5 after 0.236 secs (14860 patterns/sec) INFO [2017-01-28 14:06:13] Epoch: 2 of 5 INFO [2017-01-28 14:06:13] Classification error on Train set: 100% (3507/3507) INFO [2017-01-28 14:06:13] Train set RMSE: 1.000 INFO [2017-01-28 14:06:13] Classification error on Validation set: 100% (2093/2093) INFO [2017-01-28 14:06:13] Validation set RMSE: 1.000 INFO [2017-01-28 14:06:13] Finished epoch 2 of 5 after 0.24 secs (14810 patterns/sec) INFO [2017-01-28 14:06:13] Epoch: 3 of 5 INFO [2017-01-28 14:06:13] Classification error on Train set: 100% (3507/3507) INFO [2017-01-28 14:06:13] Train set RMSE: 1.000 INFO [2017-01-28 14:06:13] Classification error on Validation set: 100% (2093/2093) INFO [2017-01-28 14:06:13] Validation set RMSE: 1.000 INFO [2017-01-28 14:06:13] Finished epoch 3 of 5 after 0.253 secs (14784 patterns/sec) INFO [2017-01-28 14:06:13] Epoch: 4 of 5 INFO [2017-01-28 14:06:14] Classification error on Train set: 100% (3507/3507) INFO [2017-01-28 14:06:14] Train set RMSE: 1.000 INFO [2017-01-28 14:06:14] Classification error on Validation set: 100% (2093/2093) INFO [2017-01-28 14:06:14] Validation set RMSE: 1.000 INFO [2017-01-28 14:06:14] Finished epoch 4 of 5 after 0.237 secs (15840 patterns/sec) INFO [2017-01-28 14:06:14] Epoch: 5 of 5 INFO [2017-01-28 14:06:14] Classification error on Train set: 100% (3507/3507) INFO [2017-01-28 14:06:14] Train set RMSE: 1.000 INFO [2017-01-28 14:06:14] Classification error on Validation set: 100% (2093/2093) INFO [2017-01-28 14:06:14] Validation set RMSE: 1.000 INFO [2017-01-28 14:06:14] Finished epoch 5 of 5 after 0.239 secs (14674 patterns/sec) INFO [2017-01-28 14:06:14] Classification error on Train set (best model): 100% (3507/3507) INFO [2017-01-28 14:06:14] Train set (best model) RMSE: 1.000 INFO [2017-01-28 14:06:14] Classification error on Validation set (best model): 100% (2093/2093) INFO [2017-01-28 14:06:14] Validation set (best model) RMSE: 1.000 INFO [2017-01-28 14:06:14] Best model was found after epoch 5 INFO [2017-01-28 14:06:14] Final 1.000 validation RMSE: 1.000 INFO [2017-01-28 14:06:14] Final 1.000 validation classification error: 100.00% INFO [2017-01-28 14:06:14] Fine-tuning finished after 1.251 secs

model prediction

yhat_darch <- predict(darchmodel, newdata = test_darch, type = "bin") #type = "prob"[,2] also possible!?

evaluation

numIncorrect <- sum(yhat_darch != test_darch[,return_customer]) Error in [.data.frame(test_darch, , return_customer) : object 'return_customer' not found cat(paste0("Incorrect classifications on all examples: ", numIncorrect, " (",

  • round(numIncorrect/nrow(test_darch)*100, 2), "%)\n")) Incorrect classifications on all examples: 1400 (100%)
saviola777 commented 7 years ago

Alright, I don't know why darch doesn't generate a network with two output neurons for your dataset, try to use the parameter preProc.fullRank.targets = T. You don't need to convert the output variable yourself.

richigneven commented 7 years ago

Ahh okay, thats easier. thanks... kept playing now with the variables but still a 100% anti overfitting...how is this even possible? if u have any idea, im happy for any suggestions which parameters might be to blame...here my last output...

darchmodel <- darch(return_customer~ newsletter + success_rate + goods_value +already_existing_account , data = train_darch,

Pre-processing:

INFO [2017-01-28 18:36:50] Converting non-numeric columns in targets (if any)... INFO [2017-01-28 18:36:50] Converting 1 columns (y) to numeric INFO [2017-01-28 18:36:50] The current log level is: INFO INFO [2017-01-28 18:36:50] Using CPU matrix multiplication. WARN [2017-01-28 18:36:50] No vector given for "layers" parameter, constructing shallow network with one hidden layer of 10 neurons. INFO [2017-01-28 18:36:50] Creating and configuring new DArch instance INFO [2017-01-28 18:36:50] Constructing a network with 3 layers (4, 10, 1 neurons). INFO [2017-01-28 18:36:50] Generating RBMs. INFO [2017-01-28 18:36:50] Constructing new RBM instance with 4 visible and 10 hidden units. INFO [2017-01-28 18:36:50] Constructing new RBM instance with 10 visible and 1 hidden units. INFO [2017-01-28 18:36:50] DArch instance ready for training, here is a summary of its configuration: INFO [2017-01-28 18:36:50] Global parameters: INFO [2017-01-28 18:36:50] Layers parameter was 10, resulted in network with 3 layers and 4, 10, 1 neurons INFO [2017-01-28 18:36:50] The weights for the layers were generated with "generateWeightsGlorotUniform" INFO [2017-01-28 18:36:50] Additionally, the following parameters were used for weight generation: INFO [2017-01-28 18:36:50] [weights] Parameter weights.max is 0.1 INFO [2017-01-28 18:36:50] [weights] Parameter weights.min is -0.1 INFO [2017-01-28 18:36:50] [weights] Parameter weights.mean is 0 INFO [2017-01-28 18:36:50] [weights] Parameter weights.sd is 0.01 INFO [2017-01-28 18:36:51] Weight normalization is disabled INFO [2017-01-28 18:36:51] Bootstrapping is disabled INFO [2017-01-28 18:36:51] Train data are shuffled before each epoch INFO [2017-01-28 18:36:51] Autosaving is disabled INFO [2017-01-28 18:36:51] Using CPU for matrix multiplication INFO [2017-01-28 18:36:51] Pre-processing parameters: INFO [2017-01-28 18:36:51] [preProc] Parameter preProc.factorToNumeric is TRUE INFO [2017-01-28 18:36:51] [preProc] Parameter preProc.factorToNumeric.targets is TRUE INFO [2017-01-28 18:36:51] [preProc] Parameter preProc.fullRank is TRUE INFO [2017-01-28 18:36:51] [preProc] Parameter preProc.fullRank.targets is TRUE INFO [2017-01-28 18:36:51] [preProc] Parameter preProc.orderedToFactor.targets is TRUE INFO [2017-01-28 18:36:51] [preProc] Parameter preProc.targets is TRUE INFO [2017-01-28 18:36:51] Caret pre-processing is disabled INFO [2017-01-28 18:36:51] Pre-training parameters: INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.allData is FALSE INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.batchSize is 50 INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.consecutive is TRUE INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.errorFunction is "rmseError" INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.finalMomentum is 0.9 INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.initialMomentum is 0.5 INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.lastLayer is 0 INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.learnRate is 1 INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.learnRateScale is 1 INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.momentumRampLength is 1 INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.numCD is 1 INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.numEpochs is 4 INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.unitFunction is "sigmoidUnitRbm" INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.updateFunction is "rbmUpdate" INFO [2017-01-28 18:36:51] [preTrain] Parameter rbm.weightDecay is 2e-04 INFO [2017-01-28 18:36:51] The selected RBMs have been trained for 0 epochs INFO [2017-01-28 18:36:51] Fine-tuning parameters: INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.batchSize is 1 INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.dither is FALSE INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.dropout is 0 INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.dropout.dropConnect is FALSE INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.dropout.momentMatching is 0 INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.dropout.oneMaskPerEpoch is FALSE INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.elu.alpha is 1 INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.errorFunction is "rmseError" INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.finalMomentum is 0.9 INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.fineTuneFunction is "rpropagation" INFO [2017-01-28 18:36:51] [RPROP] Using rpropagation for fine-tuning INFO [2017-01-28 18:36:51] [RPROP] Parameter rprop.method is "iRprop+" INFO [2017-01-28 18:36:51] [RPROP] Parameter rprop.decFact is 0.5 INFO [2017-01-28 18:36:51] [RPROP] Parameter rprop.incFact is 1.2 INFO [2017-01-28 18:36:51] [RPROP] Parameter rprop.initDelta is 0.0125 INFO [2017-01-28 18:36:51] [RPROP] Parameter rprop.minDelta is 1e-06 INFO [2017-01-28 18:36:51] [RPROP] Parameter rprop.maxDelta is 50 INFO [2017-01-28 18:36:51] [RPROP] See ?rpropagation for documentation INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.initialMomentum is 0.9 INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.isClass is TRUE INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.maxout.poolSize is 2 INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.maxout.unitFunction is "linearUnit" INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.momentumRampLength is 1 INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.nesterovMomentum is TRUE INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.numEpochs is 5 INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.returnBestModel is TRUE INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.returnBestModel.validationErrorFactor is 0.632120558828558 INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.stopClassErr is -Inf INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.stopErr is -Inf INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.stopValidClassErr is -Inf INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.stopValidErr is -Inf INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.trainLayers is c(TRUE, TRUE) INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.unitFunction is "sigmoidUnit" INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.weightDecay is 0 INFO [2017-01-28 18:36:51] [fineTune] Parameter darch.weightUpdateFunction is "weightDecayWeightUpdate" INFO [2017-01-28 18:36:51] The network has been fine-tuned for 0 epochs INFO [2017-01-28 18:36:51] Starting pre-training for 4 epochs INFO [2017-01-28 18:36:51] Training set consists of 5601 samples INFO [2017-01-28 18:36:51] The first 2 RBMs are going to be trained INFO [2017-01-28 18:36:51] Starting the training of the rbm with 4 visible and 10 hidden units. INFO [2017-01-28 18:36:51] [RBM 4x10] Epoch 1 RMSE error: 0.0834969430272899 INFO [2017-01-28 18:36:51] Finished epoch 1 after 0.1306019 secs INFO [2017-01-28 18:36:51] [RBM 4x10] Epoch 2 RMSE error: 0.0919481965031578 INFO [2017-01-28 18:36:51] Finished epoch 2 after 0.1141999 secs INFO [2017-01-28 18:36:51] [RBM 4x10] Epoch 3 RMSE error: 0.0741237990805017 INFO [2017-01-28 18:36:51] Finished epoch 3 after 0.1388011 secs INFO [2017-01-28 18:36:51] [RBM 4x10] Epoch 4 RMSE error: 0.0830296626661737 INFO [2017-01-28 18:36:51] Finished epoch 4 after 0.1130061 secs INFO [2017-01-28 18:36:51] Classification error on Train set: 100% (5601/5601) INFO [2017-01-28 18:36:51] Train set RMSE: 1.344 INFO [2017-01-28 18:36:51] Starting the training of the rbm with 10 visible and 1 hidden units. INFO [2017-01-28 18:36:51] [RBM 10x1] Epoch 1 RMSE error: 0.00551611469749816 INFO [2017-01-28 18:36:51] Finished epoch 1 after 0.105006 secs INFO [2017-01-28 18:36:52] [RBM 10x1] Epoch 2 RMSE error: 0.000434584963198201 INFO [2017-01-28 18:36:52] Finished epoch 2 after 0.1258001 secs INFO [2017-01-28 18:36:52] [RBM 10x1] Epoch 3 RMSE error: 0.000347141463655062 INFO [2017-01-28 18:36:52] Finished epoch 3 after 0.1101999 secs INFO [2017-01-28 18:36:52] [RBM 10x1] Epoch 4 RMSE error: 0.000291437799738362 INFO [2017-01-28 18:36:52] Finished epoch 4 after 0.1102011 secs INFO [2017-01-28 18:36:52] Classification error on Train set: 100% (5601/5601) INFO [2017-01-28 18:36:52] Train set RMSE: 1.414 INFO [2017-01-28 18:36:52] Pre-training finished after 1.05222 secs INFO [2017-01-28 18:36:52] Training set consists of 5601 samples. INFO [2017-01-28 18:36:52] Start deep architecture fine-tuning for 5 epochs INFO [2017-01-28 18:36:52] Number of Batches: 5601 (batch size 1) INFO [2017-01-28 18:36:52] Epoch: 1 of 5 INFO [2017-01-28 18:36:55] Classification error on Train set: 100% (5601/5601) INFO [2017-01-28 18:36:55] Train set RMSE: 1.000 INFO [2017-01-28 18:36:55] Finished epoch 1 of 5 after 3.06 secs (1830 patterns/sec) INFO [2017-01-28 18:36:55] Epoch: 2 of 5 INFO [2017-01-28 18:36:58] Classification error on Train set: 100% (5601/5601) INFO [2017-01-28 18:36:58] Train set RMSE: 1.000 INFO [2017-01-28 18:36:58] Finished epoch 2 of 5 after 3.08 secs (1817 patterns/sec) INFO [2017-01-28 18:36:58] Epoch: 3 of 5 INFO [2017-01-28 18:37:01] Classification error on Train set: 100% (5601/5601) INFO [2017-01-28 18:37:01] Train set RMSE: 1.000 INFO [2017-01-28 18:37:01] Finished epoch 3 of 5 after 3.13 secs (1800 patterns/sec) INFO [2017-01-28 18:37:01] Epoch: 4 of 5 INFO [2017-01-28 18:37:05] Classification error on Train set: 100% (5601/5601) INFO [2017-01-28 18:37:05] Train set RMSE: 1.000 INFO [2017-01-28 18:37:05] Finished epoch 4 of 5 after 3.11 secs (1802 patterns/sec) INFO [2017-01-28 18:37:05] Epoch: 5 of 5 INFO [2017-01-28 18:37:08] Classification error on Train set: 100% (5601/5601) INFO [2017-01-28 18:37:08] Train set RMSE: 1.000 INFO [2017-01-28 18:37:08] Finished epoch 5 of 5 after 3.08 secs (1821 patterns/sec) INFO [2017-01-28 18:37:08] Classification error on Train set (best model): 100% (5601/5601) INFO [2017-01-28 18:37:08] Train set (best model) RMSE: 1.000 INFO [2017-01-28 18:37:08] Best model was found after epoch 5 INFO [2017-01-28 18:37:08] Fine-tuning finished after 15.8 secs Warning messages: 1: In [<-(*tmp*, i, value = <S4 object of class "RBM">) : implicit list embedding of S4 objects is deprecated 2: In [<-(*tmp*, i, value = <S4 object of class "RBM">) : implicit list embedding of S4 objects is deprecated

model prediction

yhat_darch <- predict(darchmodel, newdata = test_darch, type = "class") #type = "prob"[,2] also possible!?

evaluation

numIncorrect <- sum(yhat_darch != test_darch[,17]) cat(paste0("Incorrect classifications on all examples: ", numIncorrect, " (",

  • round(numIncorrect/nrow(test_darch)*100, 2), "%)\n")) Incorrect classifications on all examples: 1399 (100%)

cheers

saviola777 commented 7 years ago

There seems to be a problem with your dataset (may be a darch bug as well, I'm not sure), it does not recognize the factors (normally it should convert factors to several columns with only 0 and 1 values) and converts them to numeric, which doesn't give good results. I'm not sure how to fix this, I'm not an expert on R data structures. 100% incorrect classification results makes no sense for binary classification as that's the same as having everything correct, so there's a problem with the output of the network. Which dataset did you use in the very beginning? Where it went down to 18% error? Because there it does not show the conversion to numeric.

I will look into the darch code next week and try to reproduce the problem. In the meantime, here are some suggestions for your parameters:

saviola777 commented 7 years ago

Ah, I missed

                preProc.factorToNumeric = T, 
                preProc.factorToNumeric.targets = T,

please set those to F and preProc.fullRank.targets = T.

richigneven commented 7 years ago

hey thanks for the help, sorry for my late response just now... im back to the 18% prediction by now, but I cant really figure why. however i know 18% is the ratio of returning customers, so Im pretty sure the model predicts everyone as a returning customer. If u have another idea feel free to let me know, if not never mind and thanks a lot for trying! :+1:

here´s my output:

darchmodel <- darch(return_customer~ newsletter + success_rate + goods_value +already_existing_account , data = train_darch, #subject to change!

Pre-processing:

INFO [2017-02-01 11:06:28] Converting non-numeric columns in targets (if any)... INFO [2017-02-01 11:06:28] Dependent factor "return_customer" converted to 2 new variables (1-of-n coding) INFO [2017-02-01 11:06:28] The current log level is: INFO INFO [2017-02-01 11:06:28] Using CPU matrix multiplication. WARN [2017-02-01 11:06:28] No vector given for "layers" parameter, constructing shallow network with one hidden layer of 10 neurons. INFO [2017-02-01 11:06:28] Creating and configuring new DArch instance INFO [2017-02-01 11:06:28] Constructing a network with 3 layers (4, 10, 1 neurons). INFO [2017-02-01 11:06:28] Generating RBMs. INFO [2017-02-01 11:06:28] Constructing new RBM instance with 4 visible and 10 hidden units. INFO [2017-02-01 11:06:28] Constructing new RBM instance with 10 visible and 1 hidden units. INFO [2017-02-01 11:06:28] DArch instance ready for training, here is a summary of its configuration: INFO [2017-02-01 11:06:28] Global parameters: INFO [2017-02-01 11:06:28] Layers parameter was 10, resulted in network with 3 layers and 4, 10, 1 neurons INFO [2017-02-01 11:06:28] The weights for the layers were generated with "generateWeightsGlorotUniform" INFO [2017-02-01 11:06:28] Additionally, the following parameters were used for weight generation: INFO [2017-02-01 11:06:28] [weights] Parameter weights.max is 0.1 INFO [2017-02-01 11:06:28] [weights] Parameter weights.min is -0.1 INFO [2017-02-01 11:06:28] [weights] Parameter weights.mean is 0 INFO [2017-02-01 11:06:28] [weights] Parameter weights.sd is 0.01 INFO [2017-02-01 11:06:28] Weight normalization is disabled INFO [2017-02-01 11:06:28] Bootstrapping is disabled INFO [2017-02-01 11:06:28] Train data are shuffled before each epoch INFO [2017-02-01 11:06:28] Autosaving is disabled INFO [2017-02-01 11:06:28] Using CPU for matrix multiplication INFO [2017-02-01 11:06:28] Pre-processing parameters: INFO [2017-02-01 11:06:28] [preProc] Parameter preProc.factorToNumeric is FALSE INFO [2017-02-01 11:06:28] [preProc] Parameter preProc.factorToNumeric.targets is FALSE INFO [2017-02-01 11:06:28] [preProc] Parameter preProc.fullRank is TRUE INFO [2017-02-01 11:06:28] [preProc] Parameter preProc.fullRank.targets is TRUE INFO [2017-02-01 11:06:28] [preProc] Parameter preProc.orderedToFactor.targets is TRUE INFO [2017-02-01 11:06:28] [preProc] Parameter preProc.targets is TRUE INFO [2017-02-01 11:06:28] Caret pre-processing is disabled INFO [2017-02-01 11:06:28] Pre-training parameters: INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.allData is FALSE INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.batchSize is 10 INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.consecutive is TRUE INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.errorFunction is "rmseError" INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.finalMomentum is 0.9 INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.initialMomentum is 0.5 INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.lastLayer is 0 INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.learnRate is 1 INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.learnRateScale is 1 INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.momentumRampLength is 1 INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.numCD is 1 INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.numEpochs is 4 INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.unitFunction is "sigmoidUnitRbm" INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.updateFunction is "rbmUpdate" INFO [2017-02-01 11:06:28] [preTrain] Parameter rbm.weightDecay is 2e-04 INFO [2017-02-01 11:06:28] The selected RBMs have been trained for 0 epochs INFO [2017-02-01 11:06:28] Fine-tuning parameters: INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.batchSize is 1 INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.dither is FALSE INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.dropout is 0 INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.dropout.dropConnect is FALSE INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.dropout.momentMatching is 0 INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.dropout.oneMaskPerEpoch is FALSE INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.elu.alpha is 1 INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.errorFunction is "rmseError" INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.finalMomentum is 0.9 INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.fineTuneFunction is "rpropagation" INFO [2017-02-01 11:06:28] [RPROP] Using rpropagation for fine-tuning INFO [2017-02-01 11:06:28] [RPROP] Parameter rprop.method is "iRprop+" INFO [2017-02-01 11:06:28] [RPROP] Parameter rprop.decFact is 0.5 INFO [2017-02-01 11:06:28] [RPROP] Parameter rprop.incFact is 1.2 INFO [2017-02-01 11:06:28] [RPROP] Parameter rprop.initDelta is 0.0125 INFO [2017-02-01 11:06:28] [RPROP] Parameter rprop.minDelta is 1e-06 INFO [2017-02-01 11:06:28] [RPROP] Parameter rprop.maxDelta is 50 INFO [2017-02-01 11:06:28] [RPROP] See ?rpropagation for documentation INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.initialMomentum is 0.9 INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.isClass is TRUE INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.maxout.poolSize is 2 INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.maxout.unitFunction is "linearUnit" INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.momentumRampLength is 1 INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.nesterovMomentum is TRUE INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.numEpochs is 5 INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.returnBestModel is TRUE INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.returnBestModel.validationErrorFactor is 0.632120558828558 INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.stopClassErr is -Inf INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.stopErr is -Inf INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.stopValidClassErr is -Inf INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.stopValidErr is -Inf INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.trainLayers is c(TRUE, TRUE) INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.unitFunction is "sigmoidUnit" INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.weightDecay is 0 INFO [2017-02-01 11:06:28] [fineTune] Parameter darch.weightUpdateFunction is "weightDecayWeightUpdate" INFO [2017-02-01 11:06:28] The network has been fine-tuned for 0 epochs INFO [2017-02-01 11:06:28] Starting pre-training for 4 epochs INFO [2017-02-01 11:06:28] Training set consists of 5601 samples INFO [2017-02-01 11:06:28] The first 2 RBMs are going to be trained INFO [2017-02-01 11:06:28] Starting the training of the rbm with 4 visible and 10 hidden units. INFO [2017-02-01 11:06:28] [RBM 4x10] Epoch 1 RMSE error: 0.323003204574307 INFO [2017-02-01 11:06:28] Finished epoch 1 after 0.4614081 secs INFO [2017-02-01 11:06:29] [RBM 4x10] Epoch 2 RMSE error: 0.319917878891257 INFO [2017-02-01 11:06:29] Finished epoch 2 after 0.4004111 secs INFO [2017-02-01 11:06:29] [RBM 4x10] Epoch 3 RMSE error: 0.317499747165885 INFO [2017-02-01 11:06:29] Finished epoch 3 after 0.5437009 secs INFO [2017-02-01 11:06:30] [RBM 4x10] Epoch 4 RMSE error: 0.318977340216636 INFO [2017-02-01 11:06:30] Finished epoch 4 after 0.4203022 secs INFO [2017-02-01 11:06:30] Classification error on Train set: 81.54% (4567/5601) INFO [2017-02-01 11:06:30] Train set RMSE: 0.713 INFO [2017-02-01 11:06:30] Starting the training of the rbm with 10 visible and 1 hidden units. INFO [2017-02-01 11:06:30] [RBM 10x1] Epoch 1 RMSE error: 0.00658042195367504 INFO [2017-02-01 11:06:30] Finished epoch 1 after 0.405803 secs INFO [2017-02-01 11:06:31] [RBM 10x1] Epoch 2 RMSE error: 0.000693930106950838 INFO [2017-02-01 11:06:31] Finished epoch 2 after 0.4087031 secs INFO [2017-02-01 11:06:31] [RBM 10x1] Epoch 3 RMSE error: 0.000539636404332261 INFO [2017-02-01 11:06:31] Finished epoch 3 after 0.392302 secs INFO [2017-02-01 11:06:31] [RBM 10x1] Epoch 4 RMSE error: 0.000471148988971575 INFO [2017-02-01 11:06:31] Finished epoch 4 after 0.394202 secs INFO [2017-02-01 11:06:31] Classification error on Train set: 81.54% (4567/5601) INFO [2017-02-01 11:06:31] Train set RMSE: 0.903 INFO [2017-02-01 11:06:31] Pre-training finished after 3.505233 secs INFO [2017-02-01 11:06:31] Training set consists of 5601 samples. INFO [2017-02-01 11:06:31] Start deep architecture fine-tuning for 5 epochs INFO [2017-02-01 11:06:32] Number of Batches: 5601 (batch size 1) INFO [2017-02-01 11:06:32] Epoch: 1 of 5 INFO [2017-02-01 11:06:35] Classification error on Train set: 18.46% (1034/5601) INFO [2017-02-01 11:06:35] Train set RMSE: 0.430 INFO [2017-02-01 11:06:35] Finished epoch 1 of 5 after 2.98 secs (1879 patterns/sec) INFO [2017-02-01 11:06:35] Epoch: 2 of 5 INFO [2017-02-01 11:06:38] Classification error on Train set: 18.46% (1034/5601) INFO [2017-02-01 11:06:38] Train set RMSE: 0.430 INFO [2017-02-01 11:06:38] Finished epoch 2 of 5 after 2.97 secs (1884 patterns/sec) INFO [2017-02-01 11:06:38] Epoch: 3 of 5 INFO [2017-02-01 11:06:41] Classification error on Train set: 18.46% (1034/5601) INFO [2017-02-01 11:06:41] Train set RMSE: 0.430 INFO [2017-02-01 11:06:41] Finished epoch 3 of 5 after 3.16 secs (1770 patterns/sec) INFO [2017-02-01 11:06:41] Epoch: 4 of 5 INFO [2017-02-01 11:06:44] Classification error on Train set: 18.46% (1034/5601) INFO [2017-02-01 11:06:44] Train set RMSE: 0.430 INFO [2017-02-01 11:06:44] Finished epoch 4 of 5 after 3.3 secs (1697 patterns/sec) INFO [2017-02-01 11:06:44] Epoch: 5 of 5 INFO [2017-02-01 11:06:48] Classification error on Train set: 18.46% (1034/5601) INFO [2017-02-01 11:06:48] Train set RMSE: 0.430 INFO [2017-02-01 11:06:48] Finished epoch 5 of 5 after 3.77 secs (1489 patterns/sec) INFO [2017-02-01 11:06:48] Classification error on Train set (best model): 18.46% (1034/5601) INFO [2017-02-01 11:06:48] Train set (best model) RMSE: 0.430 INFO [2017-02-01 11:06:48] Best model was found after epoch 5 INFO [2017-02-01 11:06:48] Fine-tuning finished after 16.35 secs Warning messages: 1: In [<-(*tmp*, i, value = <S4 object of class "RBM">) : implicit list embedding of S4 objects is deprecated 2: In [<-(*tmp*, i, value = <S4 object of class "RBM">) : implicit list embedding of S4 objects is deprecated

saviola777 commented 7 years ago

I understand this less and less, it still creates a network with 4 input neurons whereas it should be 6 (since two input variables were converted to two new variables), same for the output, it should have 2 variables. Maybe it's a problem with the caret pre-processing which automatically merges binary variables into one column... Hm, alright, let's try some more:

darchmodel <- darch(return_customer~ newsletter + success_rate + goods_value +already_existing_account , data = train_darch,
  preProc.params = list(method = c("center", "scale")),
  darch.numEpochs = 5,
  layers = c(0, 50, 10, 0),
  bp.learnRate = 0.1,
  darch.batchSize = 10,
  retainData = T
  )

Now, after training, please run

predict(darchmodel, outputlayer = 1) # returns the values after darch pre-processing
predict(darchmodel) # returns the raw network output

For the first call, please just paste a couple of the rows here or check if anything is strange about the data (not normalized etc.), for the second call I presume that all output values are the same (either 0 or 1), please check which one it is and maybe paste a couple of the rows here.

richigneven commented 7 years ago

the output is 1 in all the returning customer cases. so approx 18%

output of the prediction (last lines): [498,] 0.8158758 0.1851527 [499,] 0.8168210 0.1820696 [500,] 0.8178002 0.1835608 my guess would be the prediction is not weighted enough and only has slightly different values, by this it just sticks to the most likely option which is none returning...the data set is imbalanced 82:18%...does this matter?

saviola777 commented 7 years ago

I guess the imbalance is the problem here. It just ignores the 18% completely. The simplest thing would be replicating samples until you have about 50/50, or just selecting all returning customer cases and about as many of the non-returning customer cases and use that for the training set, and all data for validation.

richigneven commented 7 years ago

i have like a 0.45:0.55 set now but its still not predicting.. training set was 33% error and afterwards 66% output:

NFO [2017-02-03 00:31:58] Result of preProcess for targets: Created from 4802 samples and 1 variables

Pre-processing:

INFO [2017-02-03 00:31:58] Converting non-numeric columns in targets (if any)... INFO [2017-02-03 00:31:58] Dependent factor "return_customer" converted to 2 new variables (1-of-n coding) INFO [2017-02-03 00:31:58] The current log level is: INFO INFO [2017-02-03 00:31:58] Using CPU matrix multiplication. WARN [2017-02-03 00:31:58] No vector given for "layers" parameter, constructing shallow network with one hidden layer of 10 neurons. ....INFO [2017-02-03 00:31:58] Creating and configuring new DArch instance INFO [2017-02-03 00:31:58] Constructing a network with 3 layers (13, 10, 1 neurons). INFO [2017-02-03 00:31:58] Generating RBMs. ..... INFO [2017-02-03 00:31:58] The network has been fine-tuned for 0 epochs INFO [2017-02-03 00:31:58] Starting pre-training for 6 epochs INFO [2017-02-03 00:31:58] Training set consists of 4802 samples INFO [2017-02-03 00:31:58] The first 2 RBMs are going to be trained INFO [2017-02-03 00:31:58] Starting the training of the rbm with 13 visible and 10 hidden units. INFO [2017-02-03 00:31:59] [RBM 13x10] Epoch 1 RMSE error: 0.50781337462991 INFO [2017-02-03 00:31:59] Finished epoch 1 after 0.4488058 secs INFO [2017-02-03 00:31:59] [RBM 13x10] Epoch 2 RMSE error: 0.504939270606509 INFO [2017-02-03 00:31:59] Finished epoch 2 after 0.3986089 secs ...... INFO [2017-02-03 00:32:18] Train set RMSE: 0.816 INFO [2017-02-03 00:32:18] Finished epoch 5 of 5 after 3.02 secs (1601 patterns/sec) INFO [2017-02-03 00:32:18] Classification error on Train set (best model): 66.66% (3201/4802) INFO [2017-02-03 00:32:18] Train set (best model) RMSE: 0.816 INFO [2017-02-03 00:32:18] Best model was found after epoch 5 INFO [2017-02-03 00:32:18] Fine-tuning finished after 14.86 secs Warning messages: 1: In [<-(*tmp*, i, value = <S4 object of class "RBM">) : implicit list embedding of S4 objects is deprecated 2: In [<-(*tmp*, i, value = <S4 object of class "RBM">) : implicit list embedding of S4 objects is deprecated

2017-02-01 16:46 GMT+01:00 saviola777 notifications@github.com:

I guess the imbalance is the problem here. It just ignores the 18% completely. The simplest thing would be replicating samples until you have about 50/50, or just selecting all returning customer cases and about as many of the non-returning customer cases and use that for the training set, and all data for validation.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/maddin79/darch/issues/24#issuecomment-276692502, or mute the thread https://github.com/notifications/unsubscribe-auth/AWkPJgCXFUSvkc5irFAWvAd8k_sCfcJVks5rYKjGgaJpZM4LwA4z .

saviola777 commented 7 years ago

Please try it with the parameters from my comment, and try to increase the number of fine-tuning epochs – expecting convergence after 5 epochs is not realistic.

richigneven commented 7 years ago

thanks a lot. its running now! :) doesnt get much better that a classification error of around 30%, even with 300 epochs...are improvements possible you reckon? also i would like to compare it not only to wrong and write, there are several classification errors possible (from a matrix with true positive, true negative false positive and false negative) and they result in different values for the errors...is this possible to implement?

cheers and thanks soo much for your help already :)

2017-02-03 11:54 GMT+01:00 saviola777 notifications@github.com:

Please try it with the parameters from my comment https://github.com/maddin79/darch/issues/24#issuecomment-276625785, and try to increase the number of fine-tuning epochs – expecting convergence after 5 epochs is not realistic.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/maddin79/darch/issues/24#issuecomment-277219853, or mute the thread https://github.com/notifications/unsubscribe-auth/AWkPJjCpPSucWYeQToZOQowZmTAwXVb3ks5rYwdrgaJpZM4LwA4z .

saviola777 commented 7 years ago

As for the classification error, it depends on the training/validation error. As long as this is pretty equal, you should be able to improve the overall result by increasing the network size / tuning parameters. At some point you'll run into over-fitting, where your training error will improve, but your validation error will get worse.

As for the TP/TN/FP/FN matrix, I'm not sure how to best integrate it directly into the training (caret offers this matrix in its output, but it's not used for training). The easiest thing would probably be a conversion of the data set, maybe there are standards for this. Do you know of any implementations where I could look at an approach for solving this?

saviola777 commented 7 years ago

Closing due to inactivity, feel free to re-open / continue the discussion if needed.