luxonis / depthai-core

DepthAI C++ Library
MIT License
231 stars 126 forks source link

[POC] Neural Network Models #1138

Open lnotspotl opened 2 days ago

lnotspotl commented 2 days ago

Introduction

I am opening this proof-of-concept PR to facilitate discussion around the current state of depthai and more specifically how we handle many tasks related to neural network models.

I have had this idea in mind for quite a while now but only got to actually writing up a proof-of-concept last week.

Start with the what

There are a couple of nodes in depthai-core that directly interact with neural network models, e.g. NeuralNetwork, DetectionNetwork, SpatialDetectionNetwork and a few others.

Don't get me wrong, I think the way they work is great, there are easy ways to load models into memory, set the configuration parameters, or do both at the same time thanks to the NNArchive class, packaging both of these in a single zip file.

However, despite having all the different functionalities and ways of working with models, I propose we introduce a novel, unified approach of working with neural networks.

I have had the opportunity to work with all these nodes and to implement some of their methods myself. While implementing some of their functionalities, I came to the conclusion that each of the aforementioned nodes is doing more work than it really should.

Take the setModelPath as an example. I do not think a neural network node should be responsible for loading models into memory, let alone interacting with the filesystem.

In my view, there should be just one method, setModel, with a single argument being some kind of a model wrapper around both the underlying neural network model and a pertinent settings config.

Loading models into memory should be a job of a different module, one that abstracts anything model-type specific away (i.e. Blob vs Dlc) and one that ensures proper filesystem interaction and model-specific config initialization.

What’s more, there have been some ideas for the future where we would be doing config optimization (e.g. determining the perfect number of shaves for an OpenVINO model) and I would really love to see another standalone module responsible for just that, not yet another copy-pasted method for each of the neural network nodes, each doing pretty much the same thing.

Finally, when it comes to sending the model onto a device, in case of the Blob model, we are sending the blob’s bytes directly to the device, while in the case of a Dlc model, we are sending the bytes as well as the model’s path, storing the bytes at that path on the camera and finally loading that stored file again into memory (correct me if I am wrong here). Why not unify the way we handle model loading onto a device and create a module responsible for serialization as well as de-serialization of each model type.

In spite of this going against the common programmers’ precept “If it ain’t broke, don’t fix it”, I think we should improve on this, and what follows is an attempt, an idea, a mind dumb, how I think we could better structure one of the multiple parts of depthai-core and how we work with neural network models.

Finish with the how

One way how we could abstract the type of a network away is through an introduction of a variant, a C++17 feature, encapsulating all the different model types we would like to support. Let me give an example of how a header could look like. Notice the std::variant at the end.

namespace depthai {
namespace models {

enum class ModelType {
  UNKNOWN = 0
  BLOB
  SUPERBLOB
  DLC
  NNARCHIVE
};

struct ModelSettings {
  // Validate settings
  virtual bool isValid() const {
    return true;
  }

  // Check if model is supported on a specific platform (cannot use Dlc on RVC2 for instance)
  bool isSupported(dai::Platform platform) const {
    return std::find(supportedPlatforms.begin(), supportedPlatforms.end(), platform) != supportedPlatforms.end();
  }

  virtual ~ModelSettings() = default;

  std::string modelName = "";
  std::vector<dai::Platform> supportedPlatforms = {};

  // Archive config for NNArchive models
  std::optional<dai::NNArchiveVersionedConfig> nnArchiveConfig;
};

struct BlobSettings : ModelSettings {
  ...
};

struct SuperBlobSettings : BlobSettings {
  int numShaves = 6;
  ...
};

struct DlcSettings : ModelSettings {
  ...
};

class Model {
  public:
    virtual ModelType type() const = 0;
    virtual ~Model() = default;
};

class BlobModel : public Model {
  public:
    BlobModel(std::shared_ptr<dai::OpenVINO::Blob> model, std::shared_ptr<BlobSettings> settings) : modelPtr_(model), settingsPtr_(settings) {}

  ModelType type() override {
    return ModelType::BLOB;
  }

  private:
    std::shared_ptr<dai::OpenVINO::Blob> modelPtr_;
    std::shared_ptr<BlobSettings> settingsPtr_;
};

class SuperblobModel : public Model {
  ...
};

class DlcModel : public Model {
  ...
};

using ModelVariant = std::variant<BlobModel, SuperblobModel, DlcModel>;

} // namespace models
} // namespace depthai

This code snippet above is what I think could be part of the Models.hpp header file. What follows is the ModelLoader.hpp header.

namespace depthai {
namespace models {

class ModelLoader {
    ...
};

ModelVariant load(std::filesystem::path path);
ModelVariant load(std::vector<uint8_t> bytes, ModelType type)

} // namespace models
} // namespace depthai

With that, provided that all the neural network handle each of the model variants explicitly, one can load and set a model very easily.

namespace dai {

class NeuralNetwork : public DeviceNodeCRTP<DeviceNode, NeuralNetwork, NeuralNetworkProperties> {
  ...
  void setModel(const depthai::models::ModelVariant &mv) {
    std::visit([this](auto &&p){this->setModel(p);});
  }
  void setModel(const BlobModel &model);
  void setModel(const SuperblobModel &model);
  void setModel(const DlcModel &model);
  ...
};

} // namespace dai
...
// set model
nn.setModel(depthai::models::load("my/path/model.blabla"));
...

Model zoo can be built on top of the model loader, again returning a ModelVariant, all declared in ModelZoo.hpp

...
// set model
nn.setModel(depthai::models::zoo::load("yolo", "RVC2"));
...

There’s more modules, such as ModelSerializer with two methods serialize and deserialize. There’s also the ModelOptimizer with the optimize method to optimize parameters most critical for performance.

I think the brief explanation above together with the code snippets should convey what I am trying to propose here.

To make this not just an idea proposal but something I or anyone else can play with, I’ve made some changes to our codebase itself.

image

By no means do I mean to say that my proposed changes are better that what we have right now. It is simply my idea how I would structure the codebase with the benefit of hindsight and from my personal experiences working with depthai.

If anything, I hope this short write-up will make us contemplate what the user and developer experience has been so far how we could improve upon it.

Any feedback, comments or other suggestions would be much appreciated.