MNIST CNN EXAMPLE NOT RUNNING PROPERLY

brightprogrammer commented 4 years ago

https://github.com/mlpack/examples/blob/master/mnist_cnn/mnist_cnn.cpp

The above code gives error while executiing it! I figured out that the problem was due to mlpack::ann::NegativeLogLikelohood<> so I replaced it with mlpack::ann::MeanSquaredError. The code runs fine now but the model isn't learning anything! I tried changing the model, learning rate, normalizing the data. Nothing works... Sometimes the error reduces to a very small number like 0.00415 but the accuracy (both test and valid) is always low (around 9-11%). I am trying for about a week to get the model working properly! Please help!

kartikdutt18 commented 4 years ago

Hey @brightprogrammer, The issue is solved in #104, I'll open another PR with batchnorm in examples. You should see about 80% or more accuracy.

brightprogrammer commented 4 years ago

Thanks for replying so soon! Don't mind me of I ask this dumb question (as I am a newbie in this field!) : trainY has dimensions 1xN (N being number of training samples) and output layer has dimensions 10xB (B being the batch size). I find this odd! Can you please provide some resource where I can understand this? The error I mentioned was about matrix multiplication error due to incompatible dimensions!

kartikdutt18 commented 4 years ago

1xN (N being number of training samples) and output layer has dimensions 10xB.

With respect to classification problems, For the relation between N & B, in each iteration we sample trainX and trainY for B samples so number of predictions during training is B. To reduce 10 to 1, You can think that each value in 1 to 10 is the probability of the input being equal to that class. So we consider the output to be the index which has the highest probability. So during each iteration models creates 10xB output which is then converted to 1xB. Let me know if this makes.

brightprogrammer commented 4 years ago

So was the error due to classes starting from 1?

kartikdutt18 commented 4 years ago

Ohh no, The loss was really high because the output from each layer wasn't normalized. To do that, we use BatchNorm Layer.

brightprogrammer commented 4 years ago

Oo... I get it now!

brightprogrammer commented 4 years ago

Please don't close this thread until unless I check whether the example is working or not! One more thing is there : when I use mlpack::ann::LeakyReLU it gives error due to arma::as_scalar() but when I change it to mlpack::ann::ReLULayer it works fine!

kartikdutt18 commented 4 years ago

This is the result I got after 10 iterations,

Accuracy: train = 89.2%,         valid = 89.0482%

kartikdutt18 commented 4 years ago

Opened #106 to resolve this issue.

brightprogrammer commented 4 years ago

Now I get an error while running the code!

brightprogrammer@titan:~/Desktop/Projects/MNIST_CNN$ g++ main.cpp -lmlpack -lboost_serialization -larmadillo -fopenmp -o main && ./main ~/Datasets/MNIST/train.csv
Reading data ...
Start training ...

error: subtraction: incompatible matrix dimensions: 3456x1 and 6x1
terminate called after throwing an instance of 'std::logic_error'
  what():  subtraction: incompatible matrix dimensions: 3456x1 and 6x1
Aborted (core dumped)

Here is the code:

/**
 * An example of using Convolutional Neural Network (CNN) for
 * solving Digit Recognizer problem from Kaggle website.
 *
 * The full description of a problem as well as datasets for training
 * and testing are available here https://www.kaggle.com/c/digit-recognizer
 *
 * mlpack is free software; you may redistribute it and/or modify it under the
 * terms of the 3-clause BSD license.  You should have received a copy of the
 * 3-clause BSD license along with mlpack.  If not, see
 * http://www.opensource.org/licenses/BSD-3-Clause for more information.
 *
 * @author Daivik Nema
 */

#include <mlpack/core.hpp>
#include <mlpack/core/data/split_data.hpp>

#include <mlpack/methods/ann/layer/layer.hpp>
//#include <mlpack/methods/ann/loss_functions/cross_entropy_error.hpp>
#include <mlpack/methods/ann/ffn.hpp>

#include <ensmallen.hpp>

using namespace mlpack;
using namespace mlpack::ann;

using namespace arma;
using namespace std;

using namespace ens;

arma::Row<size_t> getLabels(arma::mat predOut)
{
  arma::Row<size_t> predLabels(predOut.n_cols);
  for (arma::uword i = 0; i < predOut.n_cols; ++i)
  {
    predLabels(i) = predOut.col(i).index_max() + 1;
  }
  return predLabels;
}

int main(int argc, char** argv){
  // Dataset is randomly split into validation
  // and training parts with following ratio.
  constexpr double RATIO = 0.1;

  // Allow infinite number of iterations until we stopped by EarlyStopAtMinLoss
  constexpr int MAX_ITERATIONS = 0;

  // Step size of the optimizer.
  constexpr double STEP_SIZE = 1.2e-3;

  // Number of data points in each iteration of SGD.
  constexpr int BATCH_SIZE = 50;

  cout << "Reading data ..." << endl;

  // Labeled dataset that contains data for training is loaded from CSV file.
  // Rows represent features, columns represent data points.
  mat tempDataset;

  // The original file can be downloaded from
  // https://www.kaggle.com/c/digit-recognizer/data
  data::Load(argv[1], tempDataset, true);

  // The original Kaggle dataset CSV file has headings for each column,
  // so it's necessary to get rid of the first row. In Armadillo representation,
  // this corresponds to the first column of our data matrix.
  mat dataset =
      tempDataset.submat(0, 1, tempDataset.n_rows - 1, tempDataset.n_cols - 1);

  // Split the dataset into training and validation sets.
  mat train, valid;
  data::Split(dataset, train, valid, RATIO);

  // The train and valid datasets contain both - the features as well as the
  // class labels. Split these into separate mats.
  const mat trainX = train.submat(1, 0, train.n_rows - 1, train.n_cols - 1) / 255.f;
  const mat validX = valid.submat(1, 0, valid.n_rows - 1, valid.n_cols - 1) / 255.f;

  // According to NegativeLogLikelihood output layer of NN, labels should
  // specify class of a data point and be in the interval from 1 to
  // number of classes (in this case from 1 to 10).

  // Create labels for training and validatiion datasets.
  const mat trainY = train.row(0) + 1;
  const mat validY = valid.row(0) + 1;

  // Specify the NN model. NegativeLogLikelihood is the output layer that
  // is used for classification problem. RandomInitialization means that
  // initial weights are generated randomly in the interval from -1 to 1.
  FFN<NegativeLogLikelihood<>, RandomInitialization> model;

  // Specify the model architecture.
  // In this example, the CNN architecture is chosen similar to LeNet-5.
  // The architecture follows a Conv-ReLU-Pool-Conv-ReLU-Pool-Dense schema. We
  // have used leaky ReLU activation instead of vanilla ReLU. Standard
  // max-pooling has been used for pooling. The first convolution uses 6 filters
  // of size 5x5 (and a stride of 1). The second convolution uses 16 filters of
  // size 5x5 (stride = 1). The final dense layer is connected to a softmax to
  // ensure that we get a valid probability distribution over the output classes

  // Layers schema.
  // 28x28x1 --- conv (6 filters of size 5x5. stride = 1) ---> 24x24x6
  // 24x24x6 --------------- Leaky ReLU ---------------------> 24x24x6
  // 24x24x6 --- max pooling (over 2x2 fields. stride = 2) --> 12x12x6
  // 12x12x6 --- conv (16 filters of size 5x5. stride = 1) --> 8x8x16
  // 8x8x16  --------------- Leaky ReLU ---------------------> 8x8x16
  // 8x8x16  --- max pooling (over 2x2 fields. stride = 2) --> 4x4x16
  // 4x4x16  ------------------- Dense ----------------------> 10

  // Add the first convolution layer.
  model.Add<Convolution<>>(1,  // Number of input activation maps.
                           6,  // Number of output activation maps.
                           5,  // Filter width.
                           5,  // Filter height.
                           1,  // Stride along width.
                           1,  // Stride along height.
                           0,  // Padding width.
                           0,  // Padding height.
                           28, // Input width.
                           28  // Input height.
  );

  model.Add<BatchNorm<>>(6, 1e-8);

  // Add first ReLU.
  model.Add<LeakyReLU<>>();

  // Add first pooling layer. Pools over 2x2 fields in the input.
  model.Add<MaxPooling<>>(2, // Width of field.
                          2, // Height of field.
                          2, // Stride along width.
                          2, // Stride along height.
                          true);

  // Add the second convolution layer.
  model.Add<Convolution<>>(6,  // Number of input activation maps.
                           16, // Number of output activation maps.
                           5,  // Filter width.
                           5,  // Filter height.
                           1,  // Stride along width.
                           1,  // Stride along height.
                           0,  // Padding width.
                           0,  // Padding height.
                           12, // Input width.
                           12  // Input height.
  );

  model.Add<BatchNorm<>>(16, 1e-8);

  // Add the second ReLU.
  model.Add<LeakyReLU<>>();

  // Add the second pooling layer.
  model.Add<MaxPooling<>>(2, 2, 2, 2, true);

  // Add the final dense layer.
  model.Add<Linear<>>(16 * 4 * 4, 10);
  model.Add<LogSoftMax<>>();

  cout << "Start training ..." << endl;

  // Set parameters for the Adam optimizer.
  ens::Adam optimizer(
      STEP_SIZE,  // Step size of the optimizer.
      BATCH_SIZE, // Batch size. Number of data points that are used in each
                  // iteration.
      0.9,        // Exponential decay rate for the first moment estimates.
      0.999, // Exponential decay rate for the weighted infinity norm estimates.
      1e-8,  // Value used to initialise the mean squared gradient parameter.
      MAX_ITERATIONS, // Max number of iterations.
      1e-8,           // Tolerance.
      true);

  // Train the CNN model. If this is the first iteration, weights are
  // randomly initialized between -1 and 1. Otherwise, the values of weights
  // from the previous iteration are used.
  model.Train(trainX,
              trainY,
              optimizer,
              ens::PrintLoss(),
              ens::ProgressBar(),
              // Stop the training using Early Stop at min loss.
              ens::EarlyStopAtMinLoss());

  // Matrix to store the predictions on train and validation datasets.
  mat predOut;
  // Get predictions on training data points.
  model.Predict(trainX, predOut);
  // Calculate accuracy on training data points.
  arma::Row<size_t> predLabels = getLabels(predOut);
  double trainAccuracy =
      arma::accu(predLabels == trainY) / ( double )trainY.n_elem * 100;
  // Get predictions on validating data points.
  model.Predict(validX, predOut);
  // Calculate accuracy on validating data points.
  predLabels = getLabels(predOut);
  double validAccuracy =
      arma::accu(predLabels == validY) / ( double )validY.n_elem * 100;

  std::cout << "Accuracy: train = " << trainAccuracy << "%,"
            << "\t valid = " << validAccuracy << "%" << std::endl;

  mlpack::data::Save("model.bin", "model", model, false);

  std::cout << "Predicting ..." << std::endl;
}

I also removed the lambda as it gave error while execution! can you please execute the above code on you pc?

kartikdutt18 commented 4 years ago

Did you build mlpack from source? The lambda and batchnorm (with support for mini batchnorm was recently added to mlpack, same for lamda function).

brightprogrammer commented 4 years ago

No I installed it from apt package on ubuntu... About 2 weeks ago

kartikdutt18 commented 4 years ago

Hmm, The issue should be solved if you build from source.

brightprogrammer commented 4 years ago

Version is 3.2.2-3

kartikdutt18 commented 4 years ago

Hmm, The issue should be solved if you build from source.

Could you try this.

brightprogrammer commented 4 years ago

Yes I guess I have to...

brightprogrammer commented 4 years ago

Ok so I ran the model for a random architecture and for the first time accuracy was more than 11%. It was around 60%, this means the net is learning something! I think that using a proper architecture can increase the accuracy! Thanks for help @kartikdutt18

brightprogrammer commented 4 years ago

LeakyReLU also works fine... Maybe it was due to not normalized values (I think) or maybe something else (please point out and close this thread)

kartikdutt18 commented 4 years ago

LeakyReLU also works fine... Maybe it was due to not normalized values (I think) or maybe something else (please point out and close this thread)

Most probably yes or somewhere there was error in shape that gave that error. Glad to see that the issue is resolved.

mlpack / examples

MNIST CNN EXAMPLE NOT RUNNING PROPERLY #105