kartikdutt18 / mlpack-PyTorch-Weight-Translator

Simple Repository to transfer weights from PyTorch to mlpack.
10 stars 4 forks source link

Complete Converter. #1

Closed kartikdutt18 closed 4 years ago

kartikdutt18 commented 4 years ago

The following is the to do list for converting weights from PyTorch to mlpack.

I have completed PyThon portion for this. I'll start working on C++ portion. Kindly let me know what you think. An example of XML file generated is given below :

<model>
    <layer>
        <name>Convolution2D</name>
        <input-channels>3</input-channels>
        <output-channels>32</output-channels>
        <has_weights>1</has_weights>
        <weight_offset>0</weight_offset>
        <weight_csv>./weights/darknet19/conv_weight_1.csv</weight_csv>
        <has_bias>0</has_bias>
        <bias_offset>32</bias_offset>
        <bias_csv>None</bias_csv>
        <has_running_mean>0</has_running_mean>
        <running_mean_csv>None</running_mean_csv>
        <has_running_var>0</has_running_var>
        <running_var_csv>None</running_var_csv>
    </layer>
    <layer>
        <name>BatchNorm2D</name>
        <input-channels>32</input-channels>
        <output-channels>32</output-channels>
        <has_weights>1</has_weights>
        <weight_offset>0</weight_offset>
        <weight_csv>./weights/darknet19/batchnorm_weight_2.csv</weight_csv>
        <has_bias>1</has_bias>
        <bias_offset>0</bias_offset>
        <bias_csv>./weights/darknet19/batchnorm_bias_2.csv</bias_csv>
        <has_running_mean>1</has_running_mean>
        <running_mean_csv>./weights/darknet19/batchnorm_running_mean_2.csv</running_mean_csv>
        <has_running_var>1</has_running_var>
        <running_var_csv>./weights/darknet19/batchnorm_running_mean_2.csv</running_var_csv>
    </layer>
.....
</model>

CC: @KimSangYeon-DGU, @saksham189

kartikdutt18 commented 4 years ago

Hey @KimSangYeon-DGU, @saksham189, I have added the loading portion for models as well i.e. We can now load weights into the DarkNet model from PyTorch. Only thing left to do is load running mean and variance for BatchNorm Layer in the mlpack version.

KimSangYeon-DGU commented 4 years ago

@kartikdutt18 Awesome, thanks for the update!

kartikdutt18 commented 4 years ago

Hey @KimSangYeon-DGU, @saksham189, @zoq, I have hard coded the running mean and variance for now. I have also added a bash file for running the all required code. The labels predicted by it don't match. On a subset of 50 images, most of the labels were 820. I also tried different cross-entropy as loss function. On training the model on imagenette the loss is in 1e-4. I'll test it on a unit input first and see if the output matches and we will get more insight from that.

kartikdutt18 commented 4 years ago

Hey @KimSangYeon-DGU, @saksham189, @zoq, I tested the PyTorch model on same input, it also gave nearly the same result. The converter works (IMO). The output from mlpack converter model :

819 ------> .00659043819 
819 ------> 0.00794971819

Output from PyTorch

tensor(819)  ----> tensor(0.0064)
tensor(819)  ----> tensor(0.0076)

I think there is some preprocessing required for imagenette before testing it.

KimSangYeon-DGU commented 4 years ago

@kartikdutt18 Thanks for the reporting, do you think that's caused by the input range differences between [0, 1] and [0, 255]? For mlpack, the input range should be [0, 1].

kartikdutt18 commented 4 years ago

I already tried with / without normalization, With CrossEntropy instead of NLL, with minmax scaler, size 448 instead of 224 and got the same result. I'm looking into what is preprocessing required, not sure if any is needed. I'm also thinking of testing it on a very small subset(10 images from say 500 classes) of imagenet (both pytorch and mlpack model and getting the accuracy).

KimSangYeon-DGU commented 4 years ago

@kartikdutt18 while finding the root of issue, let's manually train the mlpack's model using imagenette. If you need for me to run it on my machine, feel free to let me know.

kartikdutt18 commented 4 years ago

Hey @KimSangYeon-DGU, I used 10000 random tensors for checking if the pytorch weights are correct and most of them gave the output of 819 and 561, I'm looking into the possibility if the given weights produce good results. Since weights transferred in mlpack also give the same result, the converter works but I'm not sure if the weights are correct.

train the mlpack's model using imagenette.

Sure. I'll put it on train.

kartikdutt18 commented 4 years ago

Hey @KimSangYeon-DGU, I finally got the PyTorch model working, There are two things that I were incorrect,

  1. Model was needed to be set to eval before prediction.
  2. Labels in most imagenet problems are sorted i.e. 0 corresponds to smallest lexicographical class where as in Darknet frameworks indices are made from this list. After the above two points were fixed the Darknet model gave the right predictions.

When it comes to mlpack, whenever I try to use darknet.GetModel().Evaluate(input, predictions); I get the following error :

error: element-wise multiplication: incompatible matrix dimensions: 0x0 and 1000x1

but If I generate output using train and predict it works and produces same output PyTorch's model on train.

This is the script (similar to the one in the repo).

/**
 * @file imagenette_trainer.hpp
 * @author Kartik Dutt
 *
 * Contains implementation of object classification suite. It can be used
 * to select object classification model, it's parameter dataset and
 * other training parameters.
 *
 * NOTE: This code needs to be adapted as this implementation doesn't support
 *       Command Line Arguments.
 *
 * mlpack is free software; you may redistribute it and/or modify it under the
 * terms of the 3-clause BSD license.  You should have received a copy of the
 * 3-clause BSD license along with mlpack.  If not, see
 * http://www.opensource.org/licenses/BSD-3-Clause for more information.
 */

#include <mlpack/core.hpp>
#include <dataloader/dataloader.hpp>
#include <models/models.hpp>
#include <utils/utils.hpp>
#include <ensmallen_utils/print_metric.hpp>
#include <ensmallen_utils/periodic_save.hpp>
#include <ensmallen.hpp>
#include <mlpack/methods/ann/layer/layer_types.hpp>
#include <mlpack/methods/ann/layer_names.hpp>
#include <boost/property_tree/xml_parser.hpp>
#include <boost/property_tree/ptree.hpp>
#include <boost/foreach.hpp>
#include <utils/utils.hpp>
#include <ensmallen_utils/print_metric.hpp>
#include <ensmallen_utils/periodic_save.hpp>
#include <ensmallen.hpp>
#include <mlpack/methods/ann/loss_functions/cross_entropy_error.hpp>

using namespace mlpack;
using namespace mlpack::ann;
using namespace arma;
using namespace std;
using namespace ens;

std::queue<std::string> batchNormRunningMean;
std::queue<std::string> batchNormRunningVar;

class Accuracy
{
 public:
  template<typename InputType, typename OutputType>
  static double Evaluate(InputType& input, OutputType& output)
  {
    arma::Row<size_t> predLabels(input.n_cols);
    for (arma::uword i = 0; i < input.n_cols; ++i)
    {
      predLabels(i) = input.col(i).index_max() + 1;
    }
    return arma::accu(predLabels == output) / (double)output.n_elem * 100;
  }
};

template <
    typename OutputLayer = mlpack::ann::NegativeLogLikelihood<>,
    typename InitializationRule = mlpack::ann::RandomInitialization>
void LoadWeights(mlpack::ann::FFN<OutputLayer, InitializationRule> &model,
                 std::string modelConfigPath)
{
  std::cout << "Loading Weights\n";
  size_t currentOffset = 0;
  boost::property_tree::ptree xmlFile;
  boost::property_tree::read_xml(modelConfigPath, xmlFile);
  boost::property_tree::ptree modelConfig = xmlFile.get_child("model");
  BOOST_FOREACH (boost::property_tree::ptree::value_type const &layer, modelConfig)
  {
    std::string progressBar(81, '-');
    size_t filled = std::ceil(currentOffset * 80.0 / model.Parameters().n_elem);
    progressBar[0] = '[';
    std::fill(progressBar.begin() + 1, progressBar.begin() + filled + 1, '=');
    std::cout << progressBar << "] " << filled * 100.0 / 80.0 << "%\r";
    std::cout.flush();

    // Load Weights.
    if (layer.second.get_child("has_weights").data() != "0")
    {
      arma::mat weights;
      mlpack::data::Load("./../../../" + layer.second.get_child("weight_csv").data(), weights);
      model.Parameters()(arma::span(currentOffset, currentOffset + weights.n_elem - 1),
                         arma::span()) = weights.t();
      currentOffset += weights.n_elem;
    }
    else
    {
      currentOffset += std::stoi(layer.second.get_child("weight_offset").data());
    }

    // Load Biases.
    if (layer.second.get_child("has_bias").data() != "0")
    {
      arma::mat bias;
      mlpack::data::Load("./../../../" + layer.second.get_child("bias_csv").data(), bias);
      model.Parameters()(arma::span(currentOffset, currentOffset + bias.n_elem - 1),
                         arma::span()) = bias.t();
      currentOffset += bias.n_elem;
    }
    else
    {
      currentOffset += std::stoi(layer.second.get_child("bias_offset").data());
    }

    if (layer.second.get_child("has_running_mean").data() != "0")
    {
      batchNormRunningMean.push("./../../../" + layer.second.get_child("running_mean_csv").data());
    }

    if (layer.second.get_child("has_running_var").data() != "0")
    {
      batchNormRunningVar.push("./../../../" + layer.second.get_child("running_var_csv").data());
    }
  }
  std::cout << std::endl;
  std::cout << "Loaded Weights\n";
}

void LoadBNMats(arma::mat& runningMean, arma::mat& runningVar)
{
  runningMean.clear();
  if (!batchNormRunningMean.empty())
  {
    mlpack::data::Load(batchNormRunningMean.front(), runningMean);
    batchNormRunningMean.pop();
  }
  else
    std::cout << "This should never happen!\n";

  runningVar.clear();
  if (!batchNormRunningVar.empty())
  {
    mlpack::data::Load(batchNormRunningVar.front(), runningVar);
    batchNormRunningVar.pop();
  }
  else
    std::cout << "This should never happen!\n";
}

template <
    typename OutputLayer = mlpack::ann::NegativeLogLikelihood<>,
    typename InitializationRule = mlpack::ann::RandomInitialization
>void HardCodedRunningMeanAndVariance(
    mlpack::ann::FFN<OutputLayer, InitializationRule> &model)
{
  arma::mat runningMean, runningVar;
  vector<size_t> indices = {1, 3, 5, 6, 7, 9, 10, 11, 13, 14, 15, 16, 17, 19, 20, 21, 22, 23};
  for (size_t idx : indices)
  {
    LoadBNMats(runningMean, runningVar);
    std::cout << "Loading RunningMean and Variance for " << idx << std::endl;
    boost::get<BatchNorm<>*>(boost::get<Sequential<>*>(model.Model()[idx])->Model()[1])->TrainingMean() = runningMean.t();
    boost::get<BatchNorm<>*>(boost::get<Sequential<>*>(model.Model()[idx])->Model()[1])->TrainingVariance() = runningVar.t();
  }
}

int main()
{
  DarkNet<mlpack::ann::CrossEntropyError<>> darknet(3, 224, 224, 1000);

  LoadWeights<mlpack::ann::CrossEntropyError<>>(darknet.GetModel(), "./../../../cfg/darknet19.xml");
  HardCodedRunningMeanAndVariance<mlpack::ann::CrossEntropyError<>>(darknet.GetModel());

  DataLoader<> dataloader;
  dataloader.LoadImageDatasetFromDirectory("./../../../../imagenette-test/", 320, 320, 3,
      true, 0.3, false, {"resize : 224"});
  constexpr double RATIO = 0.4;
  constexpr size_t EPOCHS = 4;
  constexpr double STEP_SIZE = 0.001;
  constexpr int BATCH_SIZE = 8;

  // dataloader.TrainFeatures() = dataloader.TrainFeatures() / 255.0;
  std::cout << "Data scaled!\n";

  for (size_t i = 0; i < dataloader.TrainFeatures().n_cols; i++)
  {
    std::cout << i << " ---> ";
    arma::mat input, predictions;
    input = dataloader.TrainFeatures().col(i);
    darknet.GetModel().Evaluate(input, predictions);
    std::cout << predictions.index_max() << " ::: " <<
        predictions(predictions.index_max())<< std::endl;
  }

  mlpack::data::Save("darknet19.bin", "darknet",
      darknet.GetModel(), false);
  return 0;
}
kartikdutt18 commented 4 years ago

The PyTorch model on 80 image test set from imagenette, gave an accuracy of 78%. In mlpack, If set deterministic = true for batchnorm it at least predicts varying labels. If I don't then it predicts mostly 819. I am not sure why, Shouldn't the deterministic visitor set deterministic parameter for all layers equal to true for predicting.

kartikdutt18 commented 4 years ago

Since this has already been tested. I'm closing this issue. Thanks for all the help.