Multiple Async Models for MYRIAD

harborretail commented 3 years ago

I'm having an issue where instantiating a new network on the NCS2 creates an issue with any existing networks i've instantiated.

I have a pipeline that i'm passing images through several models and want to run all of the inference jobs on the NCS2 but can only seem to get this library to spin up one.

If I run all of the models on the CPU I don't have the same issue -- perhaps an issue in this library?

artyomtugaryov commented 3 years ago

Hello @harborretail I will take a look at this problem. Can you share a code sample?

harborretail commented 3 years ago

I'm pulling excerpts from a larger application -- but essentially the behavior is -- if device_name == "CPU" -- everything runs without issue -- if I try to use "MYRIAD" for more than one of the engines -- it throws this error:

node:internal/process/promises:246
          triggerUncaughtException(err, true /* fromPromise */);
          ^

[Error: Can not init Myriad device: NC_ERROR]

From main.js:

var device_name = "MYRIAD";

(async () => {
    await faceEngine(device_name);
    await ageGenEngine(device_name); //omitted -- but is just like the faceEngine but running different model and has its own outputs
    await emoEngine(device_name); //omitted -- but is just like the faceEngine but running different model and has its own outputs
    await landmarksEngine(device_name); //omitted -- but is just like the faceEngine but running different model and has its own outputs
    await headposeEngine(device_name); //omitted -- but is just like the faceEngine but running different model and has its own outputs
    initialized = true;
    console.log("ENGINES INITIALIZED");
})();

model initialization from 'faceEngine' -- this is an imported module:

var core_face, model_face, bin_path_face, net_face, inputs_info_face, outputs_info_face, input_info_face, output_info_face, exec_net_face, input_dims_face, input_info_face_name, output_info_face_name;

async function faceEngine(device_name) {
    core_face = new Core();
    model_face = '/opt/intel/openvino_2020.3.194/deployment_tools/open_model_zoo/tools/downloader/intel/face-detection-retail-0004/FP32/face-detection-retail-0004.xml';
    bin_path_face = binPathFromXML(model_face);
    net_face = await core_face.readNetwork(model_face, bin_path_face);
    inputs_info_face = net_face.getInputsInfo();
    outputs_info_face = net_face.getOutputsInfo();
    input_info_face = inputs_info_face[0];
    output_info_face = outputs_info_face[0];
    input_info_face.setLayout('nhwc');
    input_info_face.setPrecision('u8');
    exec_net_face = await core_face.loadNetwork(net_face, device_name);
    input_dims_face = input_info_face.getDims();
    input_info_face_name = input_info_face.name();
    output_info_face_name = output_info_face.name();
}

which exports an async function named "faceD" and is invoked from main.js passing it image-data for processing:

async function faceD(img) {

    var results = [];

    var resultsObj = {}

    var resultsArr = [];

    var dims;

    const image_path = img;

    const input_h_face = input_dims_face[2];
    const input_w_face = input_dims_face[3];

    // MAKE A COPY OF THE FACE IMAGE TO SCALE
    var agImage = await jimp.read(image_path);
    var image = await jimp.read(image_path);
    image.contain(1000,1000);
//    image.write('./outputs/input.jpg');

    if (agImage.bitmap.height !== input_h_face &&
    agImage.bitmap.width !== input_w_face) {
    agImage.background(0xFFFFFFFF);
    agImage.contain(agImage.bitmap.width, agImage.bitmap.width);
    agImage.resize(input_w_face, input_h_face, jimp.RESIZE_BILINEAR);
    }

  normImg = await jimp.read(image);

  let infer_req_face;
  infer_req_face = exec_net_face.createInferRequest();
  const input_blob_face = infer_req_face.getBlob(input_info_face.name());
  const input_data_face = new Uint8Array(input_blob_face.wmap());

  agImage.scan(0, 0, agImage.bitmap.width, agImage.bitmap.height, function (x, y, hdx) {
    let h = Math.floor(hdx / 4) * 3;
    input_data_face[h + 2] = agImage.bitmap.data[hdx + 0];  // R
    input_data_face[h + 1] = agImage.bitmap.data[hdx + 1];  // G
    input_data_face[h + 0] = agImage.bitmap.data[hdx + 2];  // B
  });

//  input_blob_face.unmap();

  infer_req_face.infer();

  const output_blob_face = infer_req_face.getBlob(output_info_face.name());
  const output_data_face = new Float32Array(output_blob_face.rmap());

  for (i = 0, len = output_data_face.length; i < len; i += 7) {
    if(output_data_face[i+2] > 0.8 && ((output_data_face[i+5] * image.bitmap.width) - (output_data_face[i+3] * image.bitmap.width)) >= 30) {
      results.push(output_data_face.slice(i, i + 7));
    }
  }

  var counter = 0;

  if(results.length > 0) {
    results.forEach(item => {

      dims = {
         x: parseInt(item[3] * image.bitmap.width) - 10,
         y: parseInt(item[4] * image.bitmap.height) + 10,
         w: (parseInt(item[5] * image.bitmap.width)+10) - parseInt((item[3] * image.bitmap.width)-10),
         h: (parseInt(item[6] * image.bitmap.height)+10) - parseInt((item[4] * image.bitmap.height)-10)
      };

    resultsObj = {
      confidence: parseInt(item[2]* 100),
      dims: dims,
      img: null
    };

     let holdImg = new jimp({ data:image.bitmap.data, width: image.bitmap.width, height: image.bitmap.height});

      holdImg.crop((dims.x-((dims.h-dims.w)/2)), (dims.y), (dims.h), (dims.h))
      .contain(250,250)
//      .resize(250,250)
      .background(0xFFFFFFFF)
//      .greyscale()
//      .write('./outputs/detector-' + counter + '.jpg');
      counter++;

    resultsObj.img = holdImg;
    resultsArr.push(resultsObj);
    resultsObj = {};
    });
    counter = 0;
  }

    var combined = { results: resultsArr, img: normImg };
//    console.log(resultsArr.length);
    return combined;
}

harborretail commented 3 years ago

oddly-enough -- enabled "MYRIAD" on any of the instances of the ie core -- skyrockets my CPU utilization as well -- which is the opposite of what we're after by adding an ASIC yes? -- whereas when i run everything on the CPU -- only the "face-detector" model is constantly running -- since if there are no faces -- we don't send any data downstream through the rest of the inference layers.

artyomtugaryov commented 3 years ago

@harborretail I'm trying to reproduce your problem and implementing a new face detection sample. You can take a look at the #64 pull request. Actually, it is in progress and I added only the detection part. Will try to add landmarks and other features.

demid5111 commented 3 years ago

@harborretail thanks for asking.

Use case: Could you tell us more about the use case and the models that you are going to integrate into your application? Are these models run in parallel or in a pipeline or in a somewhat different manner? Are you exploring the OpenVINO API or want to build a production solution?

Hardware: Are you planning to scale your business with more NCS2 sticks? Are you pinned to the MYRIAD hardware or just want to maximize performance and can also utilize CPU or even Intel GPU? Have you made optimization of your networks to get maximum inferencing performance?

harborretail commented 3 years ago

We are currently using a variety of models to achieve this purpose:

Infer age/gender/emotion/attention(based on gaze/YPR) for audience members who approach a display on a wall
Re-recognize individuals for a maximum of a 60-minute time-window
Aggregate audience results and save to local influxDB

We do this by passing frames through this pipeline (I suppose we could parallelize a few of these operations -- if the library supports that):

face-detection-retail-0004 (get faces)
landmarks-regression-retail-0009 (get landmarks on faces)
head-pose-estimation-adas-0001 (get yaw-pitch-roll of faces)
face-reidentification-retail-0095 (get face reidentification signature -- compare to temp cache of "audience members")
age-gender-recognition-retail-0013 (infer age/gender of faces)
emotions-recognition-retail-0003 (infer emotion of faces)
Handle results, cache of faces, and emit results over websocket

This workload may require more than one NCS2 to be efficient (I think) -- which is a future ask, for now I want to see what performance is like stacking these all on one if we can -- and if we cant then test splitting it between CPU and GPU -- we run on budget x86 hardware with lightweight intel HD graphics integrated GPUs -- so I don't think we can use GPU necessarily.

Our production environment is intended to be homogenous from a 'business logic' perspective -- we do all of our development in Node.js as a primary language -- hence the interest in this library.

I'm assuming these prebuilt models are already optimized?

On Thu, Jul 22, 2021 at 8:54 AM 'Alexander Demidovskij' via harbormds < @.***> wrote:

@harborretail https://github.com/harborretail thanks for asking.

Use case: Could you tell us more about the use case and the models that you are going to integrate into your application? Are these models run in parallel or in a pipeline or in a somewhat different manner? Are you exploring the OpenVINO API or want to build a production solution?

Hardware: Are you planning to scale your business with more NCS2 sticks? Are you pinned to the MYRIAD hardware or just want to maximize performance and can also utilize CPU or even Intel GPU? Have you made optimization of your networks to get maximum inferencing performance?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/intel/inference-engine-node/issues/62#issuecomment-884888576, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALWP3FGNG73QVRI66QDEU4LTZAIHZANCNFSM47622VAQ .

--

[image: Harbor] Experiential Technology Team Mason Woessner Technology Development Engineer c. (+1) 616-607-4853 <(+1)+616-607-4853> Creating Customers for Life [image: Website] [image: LinkedIn] [image: Twitter] [image: Lines]

harborretail commented 3 years ago

It appears to me from your commits that you're basically taking the work I did and re-engineering it to provide as a sample application?!

Thats awesome!! That'll reduce my technical debt if you're adopting what I've done for community benefit.

Thanks and you guys rock!

demid5111 commented 3 years ago

@harborretail that is an interesting case. However, MYRIAD is a low-power device, therefore it might be not performant enough to process 6 stacked models. I think that in your case CPU and/or GPU utilization is required. Or you need more NCS2, ideally one-two models per one stick. Regarding our plans towards the sample that you referred to, there are no exact plans for it, it is just experimental PR.

What are your target CPU and GPU? Could you please give more details/specifications? Are you working with one/numerous cameras? What is your target throughput/latency?

I should also give the disclaimer that the current repository, and NodeJS API, are not a part of official OpenVINO API. Compare it to the officially supported C++ and Python API. This makes life easier for further productization: support, regular releases. Compared to this NodeJS API, which is designed for experiments and evaluation of OpenVINO via NodeJS technologies. Is Python API not the case for you? Probably you can think of spawned process or a micro-service that runs inference and Python API being used inside?

harborretail commented 3 years ago

Thank you for your feedback on the performance. If that's the case -- perhaps only face detection && re-identification get stacked on the MYRIAD.

We build interactive retail displays/fixtures for our clients. Typically we're using a single USB-attached camera for inference (so one video pipeline as an input). Our fixtures typically have sophisticated touchscreen user-interfaces, and potentially multiple tactile buttons, or peripheral touchscreens with additional content. Because of the nature of some of the experiences -- the video-stream is required in the UI first -- we use electron to render our UI (chromium browser) -- so we let chromium own the camera feed, then we send the stream over socket.io to the container running the inference engine. This way we have a realtime feed in the UI -- and we can reduce the framerate being sent into the pipeline.

The main reason we seek a nose.js implementation of this is -- our software stack is comprised of 30+ containerized microservices -- all bound by a robust socket.io-server -- all living on a single IoT edge-host. These devices typically sit offline -- though some clients opt for 4G connectivity for realtime data feedback.

Any custom codebases/applications/services we have are written in node.js. We have a few OOTB services like influxDB / Grafana etc -- but our approach is a node.js-first approach.

We could opt to implement this in Python -- but our long-term maintenance of that code will not work for us. Our development staffing direction doesn't include a python specialist.

It would be great if Intel considered officially supporting this development for use with OpenVINO -- lots of possibilities when bundled in electron.

As for hardware -- we fit our hardware resource requirements to the project. Often times we use Giada VM-23 -- other times Giada D68 or Intel NUC.

My target throughput for face-detection / re-recognition would be 15fps (+/- 5fps) -- and for gaze, emotion, age, gender (total throughput) I'd hope for 5fps.

On Fri, Jul 23, 2021 at 11:33 AM 'Alexander Demidovskij' via harbormds < @.***> wrote:

@harborretail https://github.com/harborretail that is an interesting case. However, MYRIAD is a low-power device, therefore it might be not performant enough to process 6 stacked models. I think that in your case CPU and/or GPU utilization is required. Or you need more NCS2, ideally one-two models per one stick. Regarding our plans towards the sample that you referred to, there are no exact plans for it, it is just experimental PR.

What are your target CPU and GPU? Could you please give more details/specifications? Are you working with one/numerous cameras? What is your target throughput/latency?

I should also give the disclaimer that the current repository, and NodeJS API, are not a part of official OpenVINO API. Compare it to the officially supported C++ and Python API. This makes life easier for further productization: support, regular releases. Compared to this NodeJS API, which is designed for experiments and evaluation of OpenVINO via NodeJS technologies. Is Python API not the case for you? Probably you can think of spawned process or a micro-service that runs inference and Python API being used inside?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/intel/inference-engine-node/issues/62#issuecomment-885722215, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALWP3FFZJ3Q24TNDBHRRSOTTZGDTBANCNFSM47622VAQ .

--

[image: Harbor] Experiential Technology Team Mason Woessner Technology Development Engineer c. (+1) 616-607-4853 <(+1)+616-607-4853> Creating Customers for Life [image: Website] [image: LinkedIn] [image: Twitter] [image: Lines]

intel / inference-engine-node

Multiple Async Models for MYRIAD #62