openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
7.38k stars 2.31k forks source link

"Failed to read output from FIFO: NC_ERROR" and "Device didn't appear after reboot" on Myriad X VPU (not NCS) #18321

Closed berrylizzard closed 1 year ago

berrylizzard commented 1 year ago
System information (version)
Detailed description

I am trying to run livestream (from an IP camera) object detection using a public model on a CPU and compare performance (in FPS) to the same run on Myriad X VPU but have encountered what seems like a bug. The CPU part of the script concludes successfully based on timer set in code or while running continuously until interrupted, but with the MYRIAD target the same code fails (see error messages below) after running at what seems like random time intervals. The length of time it is running successfully before failing seems to be longest right after reboot. With time, the script also slows down considerably (from inferencing 13 times per second right after reboot to only 2-3 times per second in the span of the longest (80 seconds) run after which it failed). All executions of this code with MYRIAD target fail in the same way in less than 80 seconds. I have included a print("in progress, time = ", time.time()) to monitor how long it takes to fail. The MYRIAD is recognized by both lsusb and in runtime.Core()'s available_devices, see below:

$ python list_devices.py
CPU:    Intel Atom(R) x6425RE Processor @ 1.90GHz
GNA: GNA_SW
MYRIAD: Intel Movidius Myriad X VPU
$ lsusb
...
Bus 003 Device 024: ID 03e7:2485 Intel Movidius MyriadX
...
$ /home/user0/build-opencv/setup_vars.sh python3 app_20230630.py
Setting vars for OpenCV 4.7.0-dev
Append PYTHONPATH: /home/user0/build-opencv/python_loader
[INFO] ======== On CPU:
[INFO] accessing video stream...
[INFO] stream url: rtsp://XXX:XXX@XXX.XXX.XXX.XXX/XXX/XXX?camera=1
[ WARN:0@1.244] global cap_gstreamer.cpp:1697 open OpenCV | GStreamer warning: unable to query duration of stream
[ WARN:0@1.244] global cap_gstreamer.cpp:1728 open OpenCV | GStreamer warning: Cannot query video position: status=1, value=0, duration=-1
in progress, time =  1688134899.5178323
in progress, time =  1688134899.6193032
in progress, time =  1688134899.7206116
...
in progress, time =  1688134928.4518013
in progress, time =  1688134928.553301
in progress, time =  1688134928.6574
[INFO] elapsed time: 30.06
[INFO] approx. FPS: 9.48
[INFO] ======== On VPU:
[INFO] accessing video stream...
[INFO] stream url: rtsp://XXX:XXX@XXX.XXX.XXX.XXX/XXX/XXX?camera=1
[ WARN:0@33.843] global cap_gstreamer.cpp:1697 open OpenCV | GStreamer warning: unable to query duration of stream
[ WARN:0@33.843] global cap_gstreamer.cpp:1728 open OpenCV | GStreamer warning: Cannot query video position: status=1, value=0, duration=-1
in progress, time =  1688134932.304877
in progress, time =  1688134932.6009543
in progress, time =  1688134932.909639
...
in progress, time =  1688134942.4884083
in progress, time =  1688134942.7812586
in progress, time =  1688134943.26852
E: [global] [    945820] [python3] addEvent:369 Condition failed: event->header.flags.bitField.ack != 1
E: [global] [    945820] [python3] addEventWithPerf:383  addEvent(event, timeoutMs) method call failed with an error: 3
E: [global] [    945820] [python3] XLinkReadData:156    Condition failed: (addEventWithPerf(&event, &opTime, 0xFFFFFFFF))
E: [ncAPI] [    945820] [python3] ncFifoReadElem:3290   Packet reading is failed.
Failed to read output from FIFO: NC_ERROR
E: [global] [    945930] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_WRITE_REQ

E: [xLink] [    945930] [Scheduler00Thr] sendEvents:1203        Event sending failed
E: [global] [    945930] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_WRITE_REQ

E: [xLink] [    945930] [Scheduler00Thr] sendEvents:1203        Event sending failed
E: [global] [    945931] [python3] addEvent:369 Condition failed: event->header.flags.bitField.ack != 1
E: [global] [    945931] [python3] addEventWithPerf:383  addEvent(event, timeoutMs) method call failed with an error: 3
E: [global] [    945931] [python3] XLinkReadData:156    Condition failed: (addEventWithPerf(&event, &opTime, 0xFFFFFFFF))
E: [ncAPI] [    945931] [python3] getGraphMonitorResponseValue:1901     XLink error, rc: X_LINK_ERROR
E: [global] [    945931] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_WRITE_REQ

E: [xLink] [    945931] [Scheduler00Thr] sendEvents:1203        Event sending failed
E: [global] [    945931] [python3] addEvent:369 Condition failed: event->header.flags.bitField.ack != 1
E: [global] [    945931] [python3] addEventWithPerf:383  addEvent(event, timeoutMs) method call failed with an error: 3
E: [global] [    945931] [python3] XLinkReadData:156    Condition failed: (addEventWithPerf(&event, &opTime, 0xFFFFFFFF))
E: [ncAPI] [    945931] [python3] getGraphMonitorResponseValue:1901     XLink error, rc: X_LINK_ERROR
E: [global] [    945931] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_CLOSE_STREAM_REQ

E: [xLink] [    945931] [Scheduler00Thr] sendEvents:1203        Event sending failed
E: [global] [    945931] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_WRITE_REQ

E: [xLink] [    945931] [Scheduler00Thr] sendEvents:1203        Event sending failed
E: [global] [    945931] [python3] addEvent:369 Condition failed: event->header.flags.bitField.ack != 1
E: [global] [    945931] [python3] addEventWithPerf:383  addEvent(event, timeoutMs) method call failed with an error: 3
E: [global] [    945931] [python3] XLinkReadData:156    Condition failed: (addEventWithPerf(&event, &opTime, 0xFFFFFFFF))
E: [ncAPI] [    945931] [python3] getGraphMonitorResponseValue:1901     XLink error, rc: X_LINK_ERROR
E: [global] [    945934] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_CLOSE_STREAM_REQ

E: [xLink] [    945934] [Scheduler00Thr] sendEvents:1203        Event sending failed
E: [global] [    945934] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_RESET_REQ

E: [xLink] [    945934] [Scheduler00Thr] sendEvents:1203        Event sending failed
E: [ncAPI] [    955934] [python3] ncDeviceClose:1851    Device didn't appear after reboot
$
Steps to reproduce

openvino_error_20230630.zip includes:

Issue submission checklist
avitial commented 1 year ago

@berrylizzard please use OpenVINO version 2022.1 or 2022.3.1 LTS. The version you are using (2022.2) does not support NCS2/HDDL devices, which highly likely may be the root cause of the issue.

Find OpenVINO archives for download 2022.1 here or the 2022.3.1 LTS here. Note Ubuntu 22.04 is not listed under the system requirements for these OpenVINO versions, not sure if you may face any issues during install/setup.

If the reported issue persists with any of these OpenVINO versions (which include NCS2/HDDL support), let us know.

berrylizzard commented 1 year ago

Thank you for reply, @avitial. From https://github.com/openvinotoolkit/openvino/discussions/16793#discussioncomment-5585501,

MYRAD and HDDL plugins are not part of 2022.3 release. Use 2022.2

That's why I switched from 2022.3 to 2022.2 (and am using it now). I was able to successfully install 2022.2 from the archive and followed the instructions here to install the included HDDL plugin.

Same question also includes feedback about support for Ubuntu 22.04 - from https://github.com/openvinotoolkit/openvino/discussions/16793#discussioncomment-5545956

While Ubuntu 22.04 is not explicitly mentioned in documentation, it's still supported. Binaries from Ubuntu 20.04 perfectly work on Ubuntu 22.04

I did have to change one file manually, but have not encountered errors that would point specifically to the Ubuntu version incompatibility.

One clarification that might be needed is whether I need the HDDL or the MYRIAD plugin for the hardware I am using? From documentation:

The HDDL and MYRIAD plugins may cause conflicts when used at the same time. To ensure proper operation in such a case, the number of booted devices needs to be limited in the hddl_autoboot.config file. Otherwise, the HDDL plugin will boot all available Intel® Movidius™ Myriad™ X devices.

If this is the cause of the "NC_ERROR", do I need the MYRIAD plugin instead to resolve the errors I am having? How do I uninstall HDDL to avoid HDDL-MYRIAD conflicts?

avitial commented 1 year ago

That's why I switched from 2022.3 to 2022.2 (and am using it now). I was able to successfully install 2022.2 from the archive and followed the instructions here to install the included HDDL plugin.

Still recommend to use 2022.1 release or 2022.3.1 LTS release for NCS2/HDDL devices, as these releases officially include support for NCS2/HDDL.

If this is the cause of the "NC_ERROR", do I need the MYRIAD plugin instead to resolve the errors I am having? How do I uninstall HDDL to avoid HDDL-MYRIAD conflicts?

You cannot use MYRIAD plugin simultaneously with HDDL plugin while HDDL plugin is running, but you should be able to use your HDDL device with both plugins (with device_name="MYRIAD" or device_name="HDDL"). Do you know how many MyriadX VPUs your HDDL card has? Prior using MYRIAD plugin ensure HDDL daemon isn't running and try executing your app again:

kill -9 $(pidof hddldaemon autoboot)
pidof hddldaemon autoboot # Make sure none of them is alive
source /opt/intel/openvino_2022/setupvars.sh
${HDDL_INSTALL_DIR}/bin/bsl_reset
berrylizzard commented 1 year ago

Switched to OpenVINO 2022.3.1 LTS and used newer Python (which it supports). Updated System information (version):

Error persists with device_name="MYRIAD":

...
in progress, time =  1688746286.8988335
E: [global] [    289796] [python3] addEvent:369 Condition failed: event->header.flags.bitField.ack != 1
E: [global] [    289796] [python3] addEventWithPerf:383  addEvent(event, timeoutMs) method call failed with an error: 3
E: [global] [    289796] [python3] XLinkReadData:156    Condition failed: (addEventWithPerf(&event, &opTime, 0xFFFFFFFF))
E: [ncAPI] [    289796] [python3] ncFifoReadElem:3290   Packet reading is failed.
Failed to read output from FIFO: NC_ERROR
E: [global] [    289884] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_WRITE_REQ

E: [xLink] [    289884] [Scheduler00Thr] sendEvents:1207        Event sending failed
E: [global] [    289884] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_WRITE_REQ

E: [xLink] [    289884] [Scheduler00Thr] sendEvents:1207        Event sending failed
E: [global] [    289884] [python3] addEvent:369 Condition failed: event->header.flags.bitField.ack != 1
E: [global] [    289884] [python3] addEventWithPerf:383  addEvent(event, timeoutMs) method call failed with an error: 3
E: [global] [    289884] [python3] XLinkReadData:156    Condition failed: (addEventWithPerf(&event, &opTime, 0xFFFFFFFF))
E: [ncAPI] [    289884] [python3] getGraphMonitorResponseValue:1901     XLink error, rc: X_LINK_ERROR
E: [global] [    289885] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_WRITE_REQ

E: [xLink] [    289885] [Scheduler00Thr] sendEvents:1207        Event sending failed
E: [global] [    289885] [python3] addEvent:369 Condition failed: event->header.flags.bitField.ack != 1
E: [global] [    289885] [python3] addEventWithPerf:383  addEvent(event, timeoutMs) method call failed with an error: 3
E: [global] [    289885] [python3] XLinkReadData:156    Condition failed: (addEventWithPerf(&event, &opTime, 0xFFFFFFFF))
E: [ncAPI] [    289885] [python3] getGraphMonitorResponseValue:1901     XLink error, rc: X_LINK_ERROR
E: [global] [    289885] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_CLOSE_STREAM_REQ

E: [xLink] [    289885] [Scheduler00Thr] sendEvents:1207        Event sending failed
E: [global] [    289885] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_WRITE_REQ

E: [xLink] [    289885] [Scheduler00Thr] sendEvents:1207        Event sending failed
E: [global] [    289885] [python3] addEvent:369 Condition failed: event->header.flags.bitField.ack != 1
E: [global] [    289885] [python3] addEventWithPerf:383  addEvent(event, timeoutMs) method call failed with an error: 3
E: [global] [    289885] [python3] XLinkReadData:156    Condition failed: (addEventWithPerf(&event, &opTime, 0xFFFFFFFF))
E: [ncAPI] [    289885] [python3] getGraphMonitorResponseValue:1901     XLink error, rc: X_LINK_ERROR
E: [global] [    289888] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_CLOSE_STREAM_REQ

E: [xLink] [    289888] [Scheduler00Thr] sendEvents:1207        Event sending failed
E: [global] [    289888] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_RESET_REQ

E: [xLink] [    289888] [Scheduler00Thr] sendEvents:1207        Event sending failed
E: [ncAPI] [    299888] [python3] ncDeviceClose:1851    Device didn't appear after reboot

If I change to device_name="HDDL", I get the following error:

RuntimeError: Cannot load library '/home/user0/intel/openvino_2022.3.1/runtime/lib/intel64/libopenvino_intel_hddl_plugin.so: libboost_filesystem.so.1.71.0: cannot open shared object file: No such file or directory

I do not have any libboost_filesystem.so files on my system (find ~ -name *libboost_filesystem.so.* returns nothing).

avitial commented 1 year ago

@berrylizzard not sure what the issue may be on your side as it is working for me. I had to do some modifications to your app to use .mp4 file instead of rtsp feed but it seems to be working just fine with 2022.3.1 LTS (I am using Python 3.8.10 on Ubuntu 20.04):

[INFO] ======== On CPU:
[INFO] accessing video stream...
[INFO] elapsed time: 4.60
[INFO] approx. FPS: 140.53
[INFO] ======== On VPU:
[INFO] accessing video stream...
[INFO] elapsed time: 33.08
[INFO] approx. FPS: 19.56

I have seen in the past applications do not pick up the right OpenVINO, can you check and make your environment is picking up the proper OV Runtime version? With the code snippet below you should see MYRIAD: openvino_intel_myriad_plugin, 2022.3.1-9227-cf2c7da5689-releases/2022/3 if 2022.3.1 version is being used.

# code snippet
from openvino import runtime
ie = runtime.Core()
# Show devices available for OpenVINO Runtime
for device in ie.available_devices:
    device_name = ie.get_property(device, "FULL_DEVICE_NAME")
    if "MYRIAD" in device:
        device = "MYRIAD"
    print(f"{device}: {device_name}")
    ver = ie.get_versions(device)[device]
    print(f"{device}: {ver.description}, {ver.build_number}")

Also make sure you perform the configuration steps for NCS2/Myriad plugin as stated in Configurations for Intel® Neural Compute Stick 2.

I do not have any libboost_filesystem.so files on my system (find ~ -name libboost_filesystem.so. returns nothing).

This lib is part of the HDDL plugin, there is some configuration required and may be why is not present in the system. Please refer to Configurations for Intel® Vision Accelerator Design with Intel® Movidius™ VPUs for those steps.

berrylizzard commented 1 year ago

@avitial thank you for checking!

Running the code snippet provided, I get:

CPU:    Intel Atom(R) x6425RE Processor @ 1.90GHz
CPU: openvino_intel_cpu_plugin, 2022.3.1-9227-cf2c7da5689-releases/2022/3
GNA: GNA_SW
GNA: openvino_intel_gna_plugin, 2022.3.1-9227-cf2c7da5689-releases/2022/3_with_GNA_LIB_VER==2
MYRIAD: Intel Movidius Myriad X VPU
MYRIAD: openvino_intel_myriad_plugin, 2022.3.1-9227-cf2c7da5689-releases/2022/3

Is it normal that all three "devices" are listed twice? Could this be the cause of the error?

I have followed instructions for Configurations for Intel® Vision Accelerator Design with Intel® Movidius™ VPUs prior to the original post, but did it again just to make sure. I have missed the other set of instructions before (Configurations for Intel® Neural Compute Stick 2), but completed them now however the errors still persists even after a reboot. One change I noticed is there are 2 new lines at the end of the error messages now (both lines reference DispatcherWaitEventComplete):

E: [global] [     26692] [python3] addEvent:369 Condition failed: event->header.flags.bitField.ack != 1
E: [global] [     26692] [python3] addEventWithPerf:383  addEvent(event, timeoutMs) method call failed with an error: 3
E: [global] [     26692] [python3] XLinkReadData:156    Condition failed: (addEventWithPerf(&event, &opTime, 0xFFFFFFFF))
E: [ncAPI] [     26692] [python3] ncFifoReadElem:3290   Packet reading is failed.
Failed to read output from FIFO: NC_ERROR
E: [global] [     26798] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_WRITE_REQ

E: [xLink] [     26798] [Scheduler00Thr] sendEvents:1207        Event sending failed
E: [global] [     26798] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_WRITE_REQ

E: [xLink] [     26798] [Scheduler00Thr] sendEvents:1207        Event sending failed
E: [global] [     26798] [python3] addEvent:369 Condition failed: event->header.flags.bitField.ack != 1
E: [global] [     26798] [python3] addEventWithPerf:383  addEvent(event, timeoutMs) method call failed with an error: 3
E: [global] [     26798] [python3] XLinkReadData:156    Condition failed: (addEventWithPerf(&event, &opTime, 0xFFFFFFFF))
E: [ncAPI] [     26798] [python3] getGraphMonitorResponseValue:1901     XLink error, rc: X_LINK_ERROR
E: [global] [     26798] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_WRITE_REQ

E: [xLink] [     26798] [Scheduler00Thr] sendEvents:1207        Event sending failed
E: [global] [     26798] [python3] addEvent:369 Condition failed: event->header.flags.bitField.ack != 1
E: [global] [     26798] [python3] addEventWithPerf:383  addEvent(event, timeoutMs) method call failed with an error: 3
E: [global] [     26798] [python3] XLinkReadData:156    Condition failed: (addEventWithPerf(&event, &opTime, 0xFFFFFFFF))
E: [ncAPI] [     26798] [python3] getGraphMonitorResponseValue:1901     XLink error, rc: X_LINK_ERROR
E: [global] [     26798] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_CLOSE_STREAM_REQ

E: [xLink] [     26798] [Scheduler00Thr] sendEvents:1207        Event sending failed
E: [global] [     26798] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_WRITE_REQ

E: [xLink] [     26798] [Scheduler00Thr] sendEvents:1207        Event sending failed
E: [global] [     26798] [python3] addEvent:369 Condition failed: event->header.flags.bitField.ack != 1
E: [global] [     26798] [python3] addEventWithPerf:383  addEvent(event, timeoutMs) method call failed with an error: 3
E: [global] [     26798] [python3] XLinkReadData:156    Condition failed: (addEventWithPerf(&event, &opTime, 0xFFFFFFFF))
E: [ncAPI] [     26799] [python3] getGraphMonitorResponseValue:1901     XLink error, rc: X_LINK_ERROR
E: [global] [     26801] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_CLOSE_STREAM_REQ

E: [xLink] [     26801] [Scheduler00Thr] sendEvents:1207        Event sending failed
E: [global] [     26801] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_RESET_REQ

E: [xLink] [     26801] [Scheduler00Thr] sendEvents:1207        Event sending failed
E: [xLink] [     26801] [python3] DispatcherWaitEventComplete:391       Assertion Failed: curr != NULL

E: [global] [     26801] [python3] XLinkResetRemote:256 Condition failed: DispatcherWaitEventComplete(&link->deviceHandle, XLINK_NO_RW_TIMEOUT)
E: [ncAPI] [     36801] [python3] ncDeviceClose:1851    Device didn't appear after reboot

Based on the new lines in the error, maybe I need to somehow increase the wait time?

berrylizzard commented 1 year ago

Some more information. Based on this very short Intel error description and this stackoverflow question I have checked the output of the dmesg command and found that the Myriad X VPU voluntarily disconnects and re-connects for no apparent to me reason. Here is the relevant dmesg output:

...
[    4.169674] usb 3-1: new high-speed USB device number 2 using xhci_hcd
...
[    4.390516] usb 3-1: New USB device found, idVendor=03e7, idProduct=2485, bcdDevice= 0.01
[    4.391954] usb 3-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[    4.393350] usb 3-1: Product: Movidius MyriadX
[    4.394773] usb 3-1: Manufacturer: Movidius Ltd.
...
[    8.043831] myd_vsc: loading out-of-tree module taints kernel.
...
[    8.045588] myd_vsc: module verification failed: signature and/or required key missing - tainting kernel
...
[    8.054533] usbcore: registered new interface driver myriad_usb_vsc
...
[ 1687.324372] usb 3-1: USB disconnect, device number 2
[ 1687.654227] usb 4-1: new SuperSpeed USB device number 2 using xhci_hcd
[ 1687.678900] usb 4-1: New USB device found, idVendor=03e7, idProduct=f63b, bcdDevice= 1.00
[ 1687.678905] usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1687.678907] usb 4-1: Product: VSC Loopback Device
[ 1687.678909] usb 4-1: Manufacturer: Intel Corporation
...
[ 1687.694262] myriad_usb_vsc 4-1:1.0: USB VSC device now attached to USBVSC-1
...
[ 1687.699119] myd_vsc_disconnect entrance
[ 1687.699473] myriad_usb_vsc 4-1:1.0: USB VSC #1 now disconnected
...
[ 1740.009994] usb 3-1: new high-speed USB device number 3 using xhci_hcd
[ 1740.227073] usb 3-1: New USB device found, idVendor=03e7, idProduct=2485, bcdDevice= 0.01
[ 1740.227092] usb 3-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1740.227098] usb 3-1: Product: Movidius MyriadX
[ 1740.227103] usb 3-1: Manufacturer: Movidius Ltd.
...
[ 1740.454114] usb 4-1: USB disconnect, device number 2
[ 2462.991260] usbcore: deregistering interface driver myriad_usb_vsc
[ 2472.142639] usbcore: registered new interface driver myriad_usb_vsc
[ 2495.817670] usbcore: deregistering interface driver myriad_usb_vsc
[ 2500.705020] usbcore: registered new interface driver myriad_usb_vsc
...
[ 4077.819710] usb 3-1: USB disconnect, device number 3
[ 4078.158775] usb 4-1: new SuperSpeed USB device number 3 using xhci_hcd
[ 4078.183571] usb 4-1: New USB device found, idVendor=03e7, idProduct=f63b, bcdDevice= 1.00
[ 4078.183576] usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 4078.183579] usb 4-1: Product: VSC Loopback Device
[ 4078.183580] usb 4-1: Manufacturer: Intel Corporation
...
[ 4078.198977] myriad_usb_vsc 4-1:1.0: USB VSC device now attached to USBVSC-1
...
[ 4078.201356] myd_vsc_disconnect entrance
[ 4078.201685] myriad_usb_vsc 4-1:1.0: USB VSC #1 now disconnected
...
[ 4124.514510] usb 3-1: new high-speed USB device number 4 using xhci_hcd
[ 4124.731646] usb 3-1: New USB device found, idVendor=03e7, idProduct=2485, bcdDevice= 0.01
[ 4124.731665] usb 3-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 4124.731671] usb 3-1: Product: Movidius MyriadX
[ 4124.731675] usb 3-1: Manufacturer: Movidius Ltd.
...
[ 4124.958632] usb 4-1: USB disconnect, device number 3
...

Seems like this could easily be the cause of my error if it disconnects inside the while loop I have written. The setup is untouched while I run my Python code (no one is physically unplugging anything) so I am very confused why this happens...

avitial commented 1 year ago

Is it normal that all three "devices" are listed twice? Could this be the cause of the error?

@berrylizzard This is normal on the code snippet as it is printing the detected device name, description and build number in separate lines.

Some more information. Based on this very short Intel error description and this stackoverflow question I have checked the output of the dmesg command and found that the Myriad X VPU voluntarily disconnects and re-connects for no apparent to me reason. Here is the relevant dmesg output:

This behavior is also expected. The NCS2 appears with vendor ID 2485 when the device is connected and idle, and 'reconnects' with ID f63b when busy and doing inference (see below). Can you try a different USB port on your system or use a powered USB hub and see if that helps?

Bus 003 Device 008: ID 03e7:2485 Intel Movidius MyriadX  # device idle
Bus 004 Device 083: ID 03e7:f63b Intel Myriad VPU [Movidius Neural Compute Stick] # device busy
berrylizzard commented 1 year ago

@avitial The device I am using is not a Neural Compute Stick 2 and does not plug into a USB port. It is a small m.2 chip with Movidius MyriadX SOC on it that is inserted into the PCIe slot on the motherboard. I do not have an option to plug it in anywhere else. In addition to being plugged in, I need to install OpenVINO toolkit with drivers to make it discoverable in the output of the lsusb and dmesg (but lspci output does not include any "Myriad" labels explicitly for some reason).

In the dmesg output included in my previous reply I also see the ID change (in my case it is the idProduct that changes from 2485 to f63b and back). Is there any programmatic way that you would recommend to leverage this disconnect-reconnect behavior for purposes of continuous inference on Myriad?

avitial commented 1 year ago

@avitial The device I am using is not a Neural Compute Stick 2 and does not plug into a USB port. It is a small m.2 chip with Movidius MyriadX SOC on it that is inserted into the PCIe slot on the motherboard. I do not have an option to plug it in anywhere else.

@berrylizzard Ugh! You are right, my mistake for forgetting that info. Can you try executing the _benchmarkapp with your model and see if the same error appears? Let's confirm first the device is working well outside of your application. May I ask the brand/model of your HDDL card and how many VPUs it has? (for example IEI Mustang-V100-MX8 with 8 VPUs, see devices here)

# app executes for 3 minutes
$ benchmark_app -m ssd_mobilenet_v1_coco.xml -d MYRIAD -t 180

Is there any programmatic way that you would recommend to leverage this disconnect-reconnect behavior for purposes of > continuous inference on Myriad?

Not aware of one for Myriad plugin, this is how the device was designed to work. You can try using the HDDL plugin as it does boot and keep MyriadX VPU devices ready for inference when HDDL service (part of the plugin) is running. I'd recommend to first get your device to work well with Myriad plugin before trying this to avoid confusion.

berrylizzard commented 1 year ago

@avitial No worries! The hardware I have is an Advantech VEGA-320 with 1 Movidius MyriadX VPU (it is listed on the website you provided).

Unfortunately, running _benchmarkapp still results in (looks like) the same error... Based on the successful pings, it ran for about 35 seconds

$ benchmark_app -m ssd_mobilenet_v1_coco.xml -d MYRIAD -t 180
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.1-9227-cf2c7da5689-releases/2022/3
[ INFO ]
[ INFO ] Device info:
[ INFO ] MYRIAD
[ INFO ] Build ................................. 2022.3.1-9227-cf2c7da5689-releases/2022/3
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(MYRIAD) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 110.35 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     image_tensor:0 , image_tensor (node: image_tensor) : u8 / [N,H,W,C] / [1,300,300,3]
[ INFO ] Model outputs:
[ INFO ]     detection_boxes:0 , detection_boxes (node: DetectionOutput) : f32 / [...] / [1,1,100,7]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     image_tensor:0 , image_tensor (node: image_tensor) : u8 / [N,H,W,C] / [1,300,300,3]
[ INFO ] Model outputs:
[ INFO ]     detection_boxes:0 , detection_boxes (node: DetectionOutput) : f32 / [...] / [1,1,100,7]
[Step 7/11] Loading the model to the device
I: [ncAPI] [    213517] [benchmark_app] ncDeviceOpen:926        ncDeviceOpen() XLinkBootRemote is running for 3.1-ma2480...

I: [ncAPI] [    213785] [benchmark_app] ncDeviceOpen:944        ncDeviceOpen() XLinkBootRemote returned success X_LINK_SUCCESS for 3.1-ma2480
I: [xLink] [    214167] [Scheduler00Thr] eventReader:658        eventReader thread started
I: [xLink] [    214167] [Scheduler00Thr] eventSchedulerRun:759  Scheduler thread started
I: [ncAPI] [    214167] [benchmark_app] ncDeviceOpen:1048       XLinkConnect done - link Id 0

I: [ncAPI] [    214167] [benchmark_app] ncDeviceOpen:1074       Booted 3.1-ma2480 (4.1-) -> VSC

I: [watchdog] [    215167] [benchmark_app] registerDevice:160   register device: 0x7ffc7f695210

I: [watchdog] [    215168] [WatchdogThread] watchdogRoutine:262 thread started

I: [ncAPI] [    215169] [benchmark_app] getDevAttributes:1271   Device attributes

I: [ncAPI] [    215169] [benchmark_app] getDevAttributes:1272   Device FW version: 444c4454.2480.0.0

I: [ncAPI] [    215169] [benchmark_app] getDevAttributes:1274   Maximum graphs: 10

I: [ncAPI] [    215169] [benchmark_app] getDevAttributes:1275   Maximum fifos: 20

I: [ncAPI] [    215169] [benchmark_app] getDevAttributes:1276   Device memory capacity: 494916416

I: [watchdog] [    215169] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 0 ms

[Info   ][VPU][MyriadPlugin] Device #0 USB ([VPU] Invalid format string : missing arguments

I: [ncAPI] [    215170] [benchmark_app] ncGraphAllocate:1974    Starting Graph allocation sequence

I: [ncAPI] [    215254] [benchmark_app] ncGraphAllocate:2065    Sent graph
I: [ncAPI] [    215314] [benchmark_app] ncGraphAllocate:2191    Graph allocation completed successfully

I: [ncAPI] [    215314] [benchmark_app] ncFifoCreate:2875       Init fifo
I: [ncAPI] [    215314] [benchmark_app] ncFifoAllocate:2976     Creating fifo
I: [ncAPI] [    215317] [benchmark_app] ncFifoCreate:2875       Init fifo
I: [ncAPI] [    215317] [benchmark_app] ncFifoAllocate:2976     Creating fifo
[ INFO ] Compile model took 2568.83 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ]   NETWORK_NAME: ssd_mobilenet_v1_coco
[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 4
[ INFO ]   DEVICE_THERMAL: 90.60277557373047
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'image_tensor'!. This input will be filled with random values!
[ INFO ] Fill input 'image_tensor' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests, limits: 180000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 44.85 ms
I: [watchdog] [    216168] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    217168] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    218167] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    219166] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    220166] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    221165] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    222165] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    223164] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    224164] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    225163] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    226163] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    227162] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    228161] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    229161] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    230160] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    231159] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    232159] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    233158] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    234158] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    235157] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    236157] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    237156] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    238156] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    239155] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    240155] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    241154] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    242153] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    243153] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    244152] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    245152] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    246151] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    247150] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    248150] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    249149] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

I: [watchdog] [    250149] [WatchdogThread] keepAlive:75        [0x5622dcbf72d0] device, ping succeed after 999 ms

E: [global] [    250926] [benchmark_app] addEvent:369   Condition failed: event->header.flags.bitField.ack != 1
E: [global] [    250926] [benchmark_app] addEventWithPerf:383    addEvent(event, timeoutMs) method call failed with an error: 3
E: [global] [    250926] [benchmark_app] XLinkReadData:156      Condition failed: (addEventWithPerf(&event, &opTime, 0xFFFFFFFF))
E: [ncAPI] [    250926] [benchmark_app] ncFifoReadElem:3290     Packet reading is failed.
E: [global] [    250927] [benchmark_app] addEvent:369   Condition failed: event->header.flags.bitField.ack != 1
E: [global] [    250927] [benchmark_app] addEventWithPerf:383    addEvent(event, timeoutMs) method call failed with an error: 3
E: [global] [    250927] [benchmark_app] XLinkReadData:156      Condition failed: (addEventWithPerf(&event, &opTime, 0xFFFFFFFF))
E: [ncAPI] [    250927] [benchmark_app] ncFifoReadElem:3290     Packet reading is failed.
E: [global] [    250927] [benchmark_app] addEvent:369   Condition failed: event->header.flags.bitField.ack != 1
E: [global] [    250927] [benchmark_app] addEventWithPerf:383    addEvent(event, timeoutMs) method call failed with an error: 3
E: [global] [    250927] [benchmark_app] XLinkReadData:156      Condition failed: (addEventWithPerf(&event, &opTime, 0xFFFFFFFF))
E: [ncAPI] [    250927] [benchmark_app] ncFifoReadElem:3290     Packet reading is failed.
[ ERROR ] Failed to read output from FIFO: NC_ERROR
Traceback (most recent call last):
  File "/home/user0/py3.10_openvino_env/lib/python3.10/site-packages/openvino/tools/benchmark/main.py", line 568, in main
    fps, median_latency_ms, avg_latency_ms, min_latency_ms, max_latency_ms, total_duration_sec, iteration = benchmark.main_loop(requests, data_queue, batch_size, args.latency_percentile, pcseq)
  File "/home/user0/py3.10_openvino_env/lib/python3.10/site-packages/openvino/tools/benchmark/benchmark.py", line 169, in main_loop
    times, total_duration_sec, iteration = self.async_inference_only(requests)
  File "/home/user0/py3.10_openvino_env/lib/python3.10/site-packages/openvino/tools/benchmark/benchmark.py", line 115, in async_inference_only
    idle_id = infer_queue.get_idle_request_id()
RuntimeError: Failed to read output from FIFO: NC_ERROR
E: [global] [    250929] [benchmark_app] addEvent:369   Condition failed: event->header.flags.bitField.ack != 1
E: [global] [    250929] [benchmark_app] addEventWithPerf:383    addEvent(event, timeoutMs) method call failed with an error: 3
E: [global] [    250929] [benchmark_app] XLinkReadData:156      Condition failed: (addEventWithPerf(&event, &opTime, 0xFFFFFFFF))
E: [ncAPI] [    250929] [benchmark_app] ncFifoReadElem:3290     Packet reading is failed.
W: [global] [    250936] [Scheduler00Thr] dispatcherLocalEventGetResponse:132   Blocked event would cause dispatching thread to wait on semaphore infinitely

E: [global] [    250936] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_WRITE_REQ

E: [xLink] [    250936] [Scheduler00Thr] sendEvents:1207        Event sending failed
E: [global] [    250936] [benchmark_app] addEvent:369   Condition failed: event->header.flags.bitField.ack != 1
E: [global] [    250936] [benchmark_app] addEventWithPerf:383    addEvent(event, timeoutMs) method call failed with an error: 3
E: [global] [    250936] [benchmark_app] XLinkReadData:156      Condition failed: (addEventWithPerf(&event, &opTime, 0xFFFFFFFF))
E: [ncAPI] [    250936] [benchmark_app] getGraphMonitorResponseValue:1901       XLink error, rc: X_LINK_ERROR
W: [ncAPI] [    250936] [benchmark_app] ncFifoDestroy:3161      myriad NACK

[Warning][VPU][MyriadPlugin] ncFifoDelete result NC_ERROR
E: [global] [    250936] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_WRITE_REQ

E: [xLink] [    250936] [Scheduler00Thr] sendEvents:1207        Event sending failed
E: [global] [    250936] [benchmark_app] addEvent:369   Condition failed: event->header.flags.bitField.ack != 1
E: [global] [    250936] [benchmark_app] addEventWithPerf:383    addEvent(event, timeoutMs) method call failed with an error: 3
E: [global] [    250936] [benchmark_app] XLinkReadData:156      Condition failed: (addEventWithPerf(&event, &opTime, 0xFFFFFFFF))
E: [ncAPI] [    250936] [benchmark_app] getGraphMonitorResponseValue:1901       XLink error, rc: X_LINK_ERROR
W: [ncAPI] [    250936] [benchmark_app] ncFifoDestroy:3161      myriad NACK

E: [global] [    250936] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_CLOSE_STREAM_REQ

E: [xLink] [    250936] [Scheduler00Thr] sendEvents:1207        Event sending failed
[Warning][VPU][MyriadPlugin] ncFifoDelete result NC_ERROR
E: [global] [    250936] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_WRITE_REQ

E: [xLink] [    250937] [Scheduler00Thr] sendEvents:1207        Event sending failed
E: [global] [    250937] [benchmark_app] addEvent:369   Condition failed: event->header.flags.bitField.ack != 1
E: [global] [    250937] [benchmark_app] addEventWithPerf:383    addEvent(event, timeoutMs) method call failed with an error: 3
E: [global] [    250937] [benchmark_app] XLinkReadData:156      Condition failed: (addEventWithPerf(&event, &opTime, 0xFFFFFFFF))
E: [ncAPI] [    250937] [benchmark_app] getGraphMonitorResponseValue:1901       XLink error, rc: X_LINK_ERROR
W: [ncAPI] [    250937] [benchmark_app] ncGraphDestroy:2232     myriad NACK

[Warning][VPU][MyriadPlugin] Deallocate Graph result NC_ERROR.
I: [ncAPI] [    250941] [benchmark_app] ncDeviceClose:1708      Removing device...
I: [watchdog] [    250941] [benchmark_app] removeDevice:202     remove device: 0x7ffc7f696fb8

I: [watchdog] [    250941] [WatchdogThread] watchdogRoutine:295 no active devices to watch, stopping  Watchdog thread

I: [watchdog] [    250941] [WatchdogThread] watchdogRoutine:321 thread ended

E: [global] [    250941] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_CLOSE_STREAM_REQ

E: [xLink] [    250941] [Scheduler00Thr] sendEvents:1207        Event sending failed
E: [global] [    250941] [Scheduler00Thr] dispatcherEventSend:54        Write failed (header) (err -4) | event XLINK_RESET_REQ

E: [xLink] [    250941] [Scheduler00Thr] sendEvents:1207        Event sending failed
I: [xLink] [    250941] [Scheduler00Thr] dispatcherClean:1044   Start Clean Dispatcher...
I: [xLink] [    250941] [Scheduler00Thr] dispatcherClean:1098   Clean Dispatcher Successfully...
I: [xLink] [    250941] [Scheduler00Thr] eventSchedulerRun:783  Scheduler thread stopped
E: [ncAPI] [    260941] [benchmark_app] ncDeviceClose:1851      Device didn't appear after reboot
I: [ncAPI] [    260941] [benchmark_app] destroyDeviceHandle:1651        Destroying device handler
I: [watchdog] [    260941] [benchmark_app] ~WatchdogImpl:125    watchdog terminated
$
avitial commented 1 year ago

[ INFO ] NETWORK_NAME: ssd_mobilenet_v1_coco [ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 4 [ INFO ] DEVICE_THERMAL: 90.60277557373047

@berrylizzard based on the log it can be observed a very high device thermal reading of 90.6 C (95 C is considered High, 105 C is Critical). On my side I see a thermal reading of 34.6 C when running the benchmark_app. You can try running the benchmark app for a shorter period of time (i.e. 5 seconds with -t 5) and see if it completes successfully.

Does your HDDL device have proper cooling (heatsink, fan, etc.) and is the room temperature normal? It may be that after 30 seconds of runtime the VPU reaches a critical temperature and software issues a reset, which then causes the observed error. Ensure the device is properly ventilated and dissipating heat.

avitial commented 1 year ago

Closing this, I hope previous responses were sufficient to help you proceed. Feel free to reopen and ask additional questions related to this topic.

berrylizzard commented 1 year ago

@avitial Thank you! Yes, overheating must have been the issue here since after installing the heatsink I am able to run inference continuously (for over 15 minutes now).