luxonis / depthai

DepthAI Python API utilities, examples, and tutorials.
https://docs.luxonis.com
MIT License
937 stars 232 forks source link

DepthAI Pipeline Builder Gen2 #136

Closed Luxonis-Brandon closed 3 years ago

Luxonis-Brandon commented 4 years ago

Start with the why:

Several of the real-world applications that are desired of the DepthAI platform are actually series or parallel (or both) combinations of neural networks with regions of interest (ROI) passed from one network to one or more subsequent networks.

The Myriad X is hardware is capable of multi-stage neural inference in parallel with computer vision functions, disparity depth, video encoding, etc. but no system exists to be able to easily use this functionality to solve real-world problems. If a user can modularly piece these together (i.e. in a pipeline builder), this gives super-interesting capabilities, and example of which is below for sports filming:

So this is just an example of how the pipeline builder can be used to string together really interesting functionalities. The core value of the builder is that it would allow many hardware/firmware capabilities to be strung together in series/parallel combinations to solve real-world problems easily:

In many of these pipeline flows of multiple nodes, there is need for custom rules and logic between nodes (e.g. filtering out which ROI 'make the cut' for the next stage. And in many cases, the pipeline is not doable without these rules as the rules are often a key implementation of a-priori knowledge by the designer, without which, the solution is not tractable.

So as such, having support for custom code/functions/etc. to enable rules is a critical feature. And the support of this feature is equally necessary when DepthAI is used with or without a host.

DepthAI used with host

When using DepthAI and megaAI with a host, having the capability to implement these rules/functions/etc. on the host is very convenient. As then the engineer can leverage the full convenience of the host for running rules, functions, and even CV capabilities.

To most flexibly facilitate this, architecting the pipeline builder such that every node (including the camera node(s)) can support (optionally) sending its output to the host and (optionally) receiving it is a key capability of such a pipeline builder.

Importantly, such a capability for each node to send/receive information from the host also enables easier development work-flows:

UPDATE 20 Nov. 2020:: The first example of this host-integrated use-case is here: https://github.com/luxonis/depthai-experiments/blob/master/gaze-estimation

DepthAI used without host (i.e. embedded use-case)

When there is no host present - for example when DepthAI is running completely standalone and directly actuating IO or communicating over SPI/UART/I2C - it is still equally necessary to allow such rules/custom code/etc.

To support this, the capability for the user to run arbitrary code on DepthAI (as nodes) is critical.

It is worth noting that when using DepthAI without a host in deployment, one could still use the with host above for debugging, while still running the full embedded flow.

The how:

To support such arbitrary pipeline builds in both with-host and without-host use-cases, we architect the pipeline builder to support every node to send data to/from the host and for CPython code to be run directly as nodes.

Integrating this, we have settled on the following approach, which breaks into 3 modalities of nodes that are used in the pipeline builder to solve embedded CV/AI problems and leveraging this information to interact with the physical world.

Node modalities:

  1. Fast, easy, limited flexibility: So the list accelerated blocks above like neural inference, 3D object localization, etc. These come pre-packaged and are trivial to make use of. But they often need application-specific logic between them, hence modality 2. And if your CV algorithm isn't on that list (or maybe you've invented your own proprietary, and you need it to run performantly on the DepthAI, see modality 3.

  2. Slow, easy, quite flexible: CPython bindings for scripts running direct on DepthAI as a node (issue https://github.com/luxonis/depthai/issues/207).
    This allows you to have custom rules on metadata from neural inference results, write custom protocols that run on-chip as part of the pipeline, communicate with sensors/actuators or other systems over SPI, UART, I2C, etc. based on pipeline results, etc. For example you can make rules that make sense of neural-inference metadata, which then control performant crop/resize/reformat to connect layers of accelerated CV functions.

  3. Fast, hard, quite flexible: OpenCL (here), G-API (more details soon) and ML Frameworks for Vectorized math are used to compile custom computer functions to run performantly on the SHAVES in DepthAI. So you can take your computer vision function, write it in OpenCL, G-API, or say in PyTorch, and drop it as a node in the pipeline builder. So this supports custom algorithms, including proprietary algorithms, to be hardware accelerated in the pipeline as a node. And the pipeline builder leverages the hardware accelerated crop/rescale/reformat to match inputs and outputs. This could even be used for non-CV functions for example be used to run custom arbitrary mathematical functions on audio data brought in via CPython over I2C. For an EXCELLENT example of how to run custom CV code on depthai using PyTorch, see this guide by Rahul Ravikumar.

The what:

If we support the following with our pipeline builder it seems it would be sufficiently flexible. So implement a pipeline builder which can be used to implement the flows below.

UPDATE 26 December 2021: The docs for Gen2 are materializing here: https://docs.luxonis.com/projects/api/en/gen2_develop/

Example Neural Pipelines To support:

Of the examples on the OpenVINO repository, the following seems like it should not be implemented, as it’s the only one that does series, parallel, and output of parallel back to a single model. So it seems much more specialized.

This will then cover the following items which were previously independently on the DepthAI roadmap:

To keep in mind, but maybe not support initially:

MXGray commented 4 years ago

@Luxonis-Brandon,

A pipeline builder can make things quicker and more straightforward to piece up! :) Some things I'm about to try:

Default mobilenet SSD (Coco) with depth:

If person

Run face recognition and face reidentification

If stranger

Run age / gender estimator

Run facial expression estimator

Run action classifier Output: 09:00. Person. Male. 20 to 25 years old. Looking happy. Standing. 2 meters away.

If not stranger

Run facial expression estimator

Run action classifier Output: 09:00. Marx. Looking happy. Standing. 2 meters away.

If not person

Run OCR detection

If text detected

Run text recognition Output: Dead center. Monitor. 1 meter away. Text reads, " Warning: Aliens Spotted Near You ".

If no text

Pass Output: Dead center. Monitor. 1 meter away.

:D

Luxonis-Brandon commented 4 years ago

Great feedback @MXGray ! Discussing internally now how difficult such results-based dynamic pipelines would be to implement. I definitely see how useful this would be... not to investigate the relative difficulty/feasibility.

Luxonis-Brandon commented 4 years ago

The initial Gaze estimation example is implemented here: https://github.com/luxonis/depthai-experiments/pull/8 Gaze Example Demo

Luxonis-Brandon commented 3 years ago

This is now implemented and mainlined. Most things that were possible in Gen1 API are now possible in Gen2. See below for resources: