openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
7.34k stars 2.29k forks source link

[Bug] Exception when switching between Myriad and CPU execution: Failed to allocate graph: NC_ERROR #11262

Closed SiegfriedIppischSecuriton closed 2 years ago

SiegfriedIppischSecuriton commented 2 years ago
System information (version)
Detailed description

We discovered an exception from OpenVino in some particular cases. Initially this happened while testing with our production-like test-hardware and with the production software. We then tried to reproduce the situation within our development environment, and then within a small code example.

I would like copy the information provided by our development team:

OpenVino throws an exception, which we do not understand. Unfortunatly, we have not found a good way to reproduce it, but it happens sometimes in the productive environment. We would like to fix this. So we are reaching out for more information and possible solution approaches.

We were able to reproduce the exception under some constraints in an debug-Program. We are not 100% sure if it is the same reason as in the productive environment, but it is the same exception.

The following screenshot shows the exception. The colored text has to be something from OpenVINO, or other libraries. The white text is from the debug-Program. I have also attached the cpp file.

grafik

Steps to reproduce
#include <iostream>
#include <thread>
#include <ostream>

#pragma warning(push, 0)
#include <ie_core.hpp>
#pragma warning(pop)

int main()
{
    const std::string modelPath = ".\\OpenVINO-models\\FP16\\person-detection-retail-0013.xml";
    const std::string myriadConfig = "MULTI:MYRIAD";
    const std::string cpuConfig = "MULTI:CPU";

    std::shared_ptr<InferenceEngine::Core> core = std::make_shared<InferenceEngine::Core>();

    while (true)
    {
        try
        {
            std::cout << "A" << std::endl;

            InferenceEngine::CNNNetwork network = core->ReadNetwork(modelPath);
            InferenceEngine::ExecutableNetwork executableNetwork = core->LoadNetwork(network, myriadConfig);
            std::this_thread::sleep_for(std::chrono::milliseconds(1000));
        }
        catch (const std::exception& ex)
        {
            std::cout << ex.what() << std::endl;
        }

        std::this_thread::sleep_for(std::chrono::milliseconds(50));

        try
        {
            std::cout << "B" << std::endl;
            InferenceEngine::CNNNetwork network = core->ReadNetwork(modelPath);
            InferenceEngine::ExecutableNetwork executableNetwork = core->LoadNetwork(network, cpuConfig);
            std::this_thread::sleep_for(std::chrono::milliseconds(1000));
        }
        catch (const std::exception& ex)
        {
            std::cout << ex.what() << std::endl;
        }

        std::this_thread::sleep_for(std::chrono::milliseconds(50));
    }

}

It looks like, the exception (and also the additional command line output), only happens with the debug-Program if these constraints are fulfilled:

  1. The VisualStudio Project settings has „Common Language Runtime Support (/clr)“ enabled.
  2. The Debug-Application is started from VisualStudio.

Some more details:

Have you ever observed such exceptions? If yes, can you tell us more about it? What are possible reasons for this exception? Are there typical situations where this exception can appear? Unfortunately our google researched not let to a meaningful result. So we need help on this topic from your side.

Issue submission checklist
Iffa-Intel commented 2 years ago

Hi @SiegfriedIppischSecuriton ,

we recommend you use the official OpenVINO Hello Classification C++ Sample as a base for your own project.

This is the easiest & fastest way to ensure all dll/libraries are properly configured where you don't have to do that manually. Besides, the sample application has been designed to be compatible with all of the OpenVINO supported devices.

OV OV_MYRIAD

Did you use a powered USB hub for the two NCS sticks?. There's a possibility that the hardware didn't receive enough voltage/current.

SiegfriedIppischSecuriton commented 2 years ago

Hi @Iffa-Meah,

Thank you for looking into our problem.

The code uses the ReadNetwork() and LoadNetwork(), as shown in the hello_classification sample. I just wanted to provide you a minimal example to reproduce the exception. So evereything with input/output preparing and runnung the inference was removed. But it seems to be neccesary that CPU and MYRIAD is both involved to get to this exception reliable. (As i mentioned, there are other ways to reproduce the exception. But this example ist the most reliable i have found)

I will check dll's and libraries again. But the code compiles, and we have also managed to run inferences on CPU and MYRIAD. Its just not reliable with MYRIAD, because of this exception.

How can i check if the MYRIAD stick has enough voltage?

Iffa-Intel commented 2 years ago

To check voltage/current, you can use the USB multimeter or by downloading 3rd party software. You may refer here.

Both CPU and MYRIAD have their own plugin that is required to be loaded in the inferencing process. It's best to use the official sample app/code/demo (as I mentioned before), because they had been designed with the proper sequence for the inference of OpenVINO with compatible devices. Even assigning sleep in-between codes need to be designed properly since it may disturb certain operation.

Custom code might seem to work just fine but there's a possibility that something required for reliable inferencing is missing. Especially when OpenVINO Inference Engine is involved. You need to ensure that all required OpenVINO libraries are imported (include in code) and used accordingly. The sample might seem simple but there are a lot more processes going on in the background if you take a closer look at the OpenVINO libraries.

SiegfriedIppischSecuriton commented 2 years ago

Hi @Iffa-Meah,

The sleep should represent some of our other code. But these timings (with, or without sleep) are somehow important to reproduce the exception.

I was not able to find an example, which uses more than one network per process. Ist this something wich is not supported? Anyways, i try to reproduce this exception within the hello_classification sample.

Looks like the myriad sticks have 500 mA. Hope this is enough.

Screenshot 2022-04-05 144153 Screenshot 2022-04-05 144206

Iffa-Intel commented 2 years ago

@SiegfriedIppischSecuriton,

OpenVINO does support multiple network execution. This example uses more than one network in a single inference: Pedestrian Tracker C++ Demo

Iffa-Intel commented 2 years ago

Closing issue, feel free to re-open or start a new issue if additional assistance is needed.

SiegfriedIppischSecuriton commented 2 years ago

Hi @Iffa-Meah,

Sorry for my late reply. I had a look into the Pedestrian Tracker C++ Demo. It does not look very different to what i did. At least for the simple case i needed (We did not use custom_cpu_library custom_layers, or perf_counter). So nothing special in preparing the InferenceEngine::Core object. Anyways, i have used this demo as a new start, and tried to reproduce this exception. It is still there.

I made these changes to the Project:

The code to reproduce the exception looks like this:

int main(int argc, char** argv) {
    try {
        [...]

        std::vector<std::string> devices{ detector_mode, reid_mode };
        InferenceEngine::Core ie =
            LoadInferenceEngine(
                devices, custom_cpu_library, path_to_custom_layers,
                should_use_perf_counter);

        while (true)
        {
            {
                std::cout << "ReadNetwork: " << detector_mode << std::endl;
                DetectorConfig detector_confid(det_model);
                ObjectDetector pedestrian_detector(detector_confid, ie, detector_mode);
            }

            {
                std::cout << "ReadNetwork: " << reid_mode << std::endl;
                DetectorConfig detector_confid(det_model);
                ObjectDetector pedestrian_detector(detector_confid, ie, reid_mode);
            }
        }

    }
    catch (const std::exception& error) {
        std::cerr << "[ ERROR ] " << error.what() << std::endl;
        return 1;
    }
    catch (...) {
        std::cerr << "[ ERROR ] Unknown/internal exception happened." << std::endl;
        return 1;
    }
}

The code now uses ObjectDetector from Pedestrian Tracker C++ Demo. It just gets constructed and deconstructed. I have also removed the sleep-Statements. The console output looks like this:

grafik

The Program was started in VisualStudio (Debug x64), with the following command line Arguments: -i "xxx" -m_det "C:\Users\SI\Downloads\openvino\person-detection-retail-0013.xml" -m_reid "xxx" -d_det MULTI:MYRIAD -d_reid MULTI:CPU

Iffa-Intel commented 2 years ago

@SiegfriedIppischSecuriton did you try to run the Pedestrian Tracker C++ Demo with your NCS2? If yes and it works, then it means this issue is not originating from your MYRIAD device and MYRIAD plugin.

The biggest possibility is, the issue caused by your custom code. You'll need to check the function part where you called the MYRIAD plugin since this is the error according to your screenshot.

Another thing is, don't forget to initiate the setupvars before running your custom code.

brmarkus commented 2 years ago

You have multiple MyriadX/NCS2 devices connected (because using the MULTI-device plugins)?

SiegfriedIppischSecuriton commented 2 years ago

@Iffa-Meah:

@SiegfriedIppischSecuriton did you try to run the Pedestrian Tracker C++ Demo with your NCS2? If yes and it works, then it means this issue is not originating from your MYRIAD device and MYRIAD plugin.

Yes, Pedestrian Tracker C++ Demo seemed to work. There is at least an image popup, wich shows a Box around Persons. My command line call was this: -i "<some image>.png" -m_det "C:\Users\SI\Downloads\openvino\person-detection-retail-0013.xml" -m_reid "C:\Users\SI\Downloads\openvino\person-reidentification-retail-0288.xml" -d_det MULTI:MYRIAD -d_reid MULTI:CPU

The unmodified Program also shows Errors. Here is the output: grafik

The biggest possibility is, the issue caused by your custom code. You'll need to check the function part where you called the MYRIAD plugin since this is the error according to your screenshot.

I have tried eleminate all my own code. LoadInferenceEngine( ) and the ctor of ObjectDetector is still untouched from the Pedestrian Tracker C++ Demo. Since my use case is a bit different, (Replacing one ObjectDetector initialized with CPU by another ObjectDetector initialized with MYRIAD and the other way round) the last bits are neccesary.

Another thing is, don't forget to initiate the setupvars before running your custom code.

I have run the script "setupvars.bat" before "build_demos_msvc.bat". But i have not run it before executing the Pedestrian Tracker C++ Demo with or without my modifications. Since the original demo worked, i thought that this is not neccesarry.

@brmarkus:

You have multiple MyriadX/NCS2 devices connected (because using the MULTI-device plugins)?

Yes. Anything else is the same setup as in my initial question. Two MYRIAD devices.

Iffa-Intel commented 2 years ago

@SiegfriedIppischSecuriton , you actually need to re-initialize the setupvars each time you close the cmd/terminal. We'll look further into this and get back to you with some other alternatives.

jgespino commented 2 years ago

@SiegfriedIppischSecuriton Is the NCS 2 connected to a USB 3.0? Is there any chance that you have a powered USB Hub to rule out a power issues? The duplicate ID error can be reproduced when inference is running and system goes to sleep, which on windows it usually cuts power to the USB ports. This leads me to believe the NCS 2 is not receiving enough power.

SiegfriedIppischSecuriton commented 2 years ago

Hi @jgespino,

Both NCS 2 are connected to USB 3.0. We have no powered USB Hub. If you think this test is neccessary, i will get one.

Anyways, I had another idea to rule out the power issue. Instead of two NCS Sticks, i have used the Mustang-V100-MX4-R10. In this run, no NCS is plugged in, ony the Mustang card. Anything else is the same setup.

The output for this run: grafik

jgespino commented 2 years ago

@SiegfriedIppischSecuriton I am not able to reproduce the issue, could you try to use OpenVINO 2022.1 release and testing with the HDDL Plugin and the Mustang-V100-MX4-R10 card?

Please install OpenVINO 2022.1 using the offline/online installer. https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html

SiegfriedIppischSecuriton commented 2 years ago

Hi @jgespino

The provided Link does not have online/offline installer. There is only pip, GitHub and Gitee for the 2022.1 version.

brmarkus commented 2 years ago

Start with this URL: https://software.intel.com/en-us/openvino-toolkit/choose-download

Via Runtime & 2022.1 && C++ && "Offline" you will get to this URL: https://registrationcenter-download.intel.com/akdlm/irc_nas/18617/l_openvino_toolkit_p_2022.1.0.643_offline.sh

SiegfriedIppischSecuriton commented 2 years ago

Hi,

Ok... I have tried to reproduce this issue with the new 2022.1 version. This did not work. If this is a multi-threaded problem, changing some parameters could always make a difference, but does not assure the problem is fixed. I was also not able to make the same setup, for several reasons:

Have you reproduced all these conditions? As far as i know, everything is neccessary:

Maybe you have something else different?

PS: The 2022.1 Doku seems to be not up to date. Looks like the Pedestrian Tracker C++ Demo needs an additional required paramter: -at ssd

jgespino commented 2 years ago

@SiegfriedIppischSecuriton You are correct, OpenVINO 2022.1 Demos have been modified to use the new OpenVINO API introduced in 2022.1 release. In this case, I reverted to testing again in 2021.4.2 to reproduce the issue before moving to 2022.1.

As I was not able to reproduce with pedestrian tracker demo, I went back to your code in your original post. I am running Visual Studio 2019 with CLR enabled and running project as debug. In your environment, how long does the program before the exception is shown?

image

SiegfriedIppischSecuriton commented 2 years ago

Hi @jgespino,

Thank you for testen it. This is very surprising to me, that you cannot reproduce it. It worked on our side on different machines with at least two different 2021.4.x OpenVino versions. The exception happens after the second "A". Du you have any suggestions how we can continue there?

Hm... I can still reproduce it. I have changed Visual Studio to stop on any exception. There are lots of InferenceEngine::NotImplemented inside OpenVino, but looks like, they will be catched at some point inside OpenVino. I have ignored this Exception and got a break for a InferenceEngine::GeneralError. The call stack is in the myriadPlugin.dll.

grafik

Would it be helpful if i provide the dump on the exception throw?

jgespino commented 2 years ago

@SiegfriedIppischSecuriton Please try to specify the myriad device as MYRIAD instead of MULTI:MYRIAD. This shouldn't be an issue but I want to confirm since we normally pass two or more devices with MUTLI plugin.

Would it be helpful if i provide the dump on the exception throw?

Yes, please share the dump. Is this exception seen with the original code and model from your initial post?

SiegfriedIppischSecuriton commented 2 years ago

Hi @jgespino

Please try to specify the myriad device as MYRIAD instead of MULTI:MYRIAD. This shouldn't be an issue but I want to confirm since we normally pass two or more devices with MUTLI plugin.

I have used MULTI, because there are two MYRIAD Sticks connected. I have unplugged one MYRIAD Stick Just and changed to "MULTI". The Program crashes.

I have tested it with "MYRIAD.1.2-ma2480" as the device config instead. (according to this doku https://docs.openvino.ai/2021.4/openvino_docs_IE_DG_supported_plugins_MULTI.html)

=> It is the same exception.

    std::vector<std::string> myriadDevices = core->GetMetric("MYRIAD", METRIC_KEY(AVAILABLE_DEVICES));
    std::string myriadConfig = "MYRIAD." + myriadDevices[0];
    std::cout << "myriadConfig: " << myriadConfig << std::endl;

Yes, please share the dump. Is this exception seen with the original code and model from your initial post?

It is the same code and model from the initial post. But this time, i have used a self-compiled OpenVino version based on tag 2021.4 (revision 5cee8bbf29797f4544b343e803de957e9f041f92). Otherwise, the stack trace and code would not be this detailed.

I will collect all the necessary Files and send you a download link per mail.

Best regards, Siegfried

jgespino commented 2 years ago

@SiegfriedIppischSecuriton Thanks for sending the necessary files, I was able to setup the visual studio project and I am able to reproduce the issue. Let me look into it and provide an update next week.

jgespino commented 2 years ago

@SiegfriedIppischSecuriton I've been trying to setup OpenVINO 2022.1 from master branch to see if the issue is still preset. However, when building as Debug and configuring Microsoft Visual Studio. Same issue is seen when setting up as debug with OpenVINO 2021.4.2. I run into the following errors.

You mentioned, you build OpenVINO from source to enable debuggin and tracing. Could you share the cmake command you used? image

Steps taken:

  1. Build OpenVINO from master branch:

    cmake -G "Visual Studio 16 2019" -A x64 -DCMAKE_BUILD_TYPE=DEBUG ..
    cmake --build . --config Debug --verbose -j8
  2. Installed OpenVINO: cmake --install <BUILDDIR> --prefix <INSTALLDIR>

  3. Source Enviornment and start visual studio

    "C:\Program Files (x86)\Intel\openvino_2022_src_d\setupvars.bat"
    "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\devenv.exe"
  4. Configure visual studio as follows: image image image image image

Please let me know if I'm missing anything.

SiegfriedIppischSecuriton commented 2 years ago

Hi @jgespino,

The error says, it cannot find std::mutex. This is a problem, i have also encountered, some time ago. Maybe it is the same.

https://stackoverflow.com/questions/49791269/mutex-is-not-supported-when-compiling-with-clr-or-clrpure-cpprestsdk-aka-casa

https://stackoverflow.com/questions/15821942/how-to-implement-a-unmanaged-thread-safe-collection-when-i-get-this-error-mute/15822678#15822678

My assumtion is, that the OpenVino version from master changed to use mutex in the header. This means, that the header cannot be included in clr projects.

A workaround might be using another c++ project, wich is not clr with the code. Then inclued this other project and from the clr project and call the method. I have not tried this jet, with the provided code example. But this setup would then be more like our productive code.

Hope this helps.

Best regards, Siegfried

jgespino commented 2 years ago

@SiegfriedIppischSecuriton Apologies for the delay, the development team has addressed an XLink issue that is related to the duplicate id error you are seeing. I am waiting for confirmation on which release the fix was implemented. I will let you know what I find out.

Regards, Jesus

Ref. 90005

SiegfriedIppischSecuriton commented 2 years ago

Hi.

We made an upgrade to the 2022.2 Version. Looks like the "Failed to allocate graph: NC_ERROR"-bug is fixed here. Thank you.

While debugging with "Application Verfrier", another error appeared. Not sure if this error is related, but it looks similar: "Failed to allocate graph: MYRIAD device is not opened.". Also the debugger stopped with an "Critical section not initialized.". The adress points also into the XLink code.

I have created another issue, to keep things separeted: https://github.com/openvinotoolkit/openvino/issues/13619

jgespino commented 2 years ago

Thanks for the update! Glad the issue has been resolved in the 2022.2 release. I will proceed to close this discussion.