luxonis / depthai

DepthAI Python API utilities, examples, and tutorials.
https://docs.luxonis.com
MIT License
928 stars 231 forks source link

[BUG] DepthAI does not work with OAK-D in LXD containers #450

Open gbiggs opened 3 years ago

gbiggs commented 3 years ago

Description of the bug

Running the DepthAI demo with python3 depthai_demo.py using a USB-C OAK-D in an LXD container fails to find the device after booting. Instead it gives the following errors:

$ python3 depthai_demo.py
Using depthai module from:  /home/geoff/.local/lib/python3.8/site-packages/depthai.cpython-38-x86_64-linux-gnu.so
Depthai version installed:  2.8.0.0
Available devices:
[0] 14442C1061F95ED700 [X_LINK_UNBOOTED]
Traceback (most recent call last):
  File "depthai_demo.py", line 132, in <module>
    with dai.Device(pm.p.getOpenVINOVersion(), device_info, usb2Mode=conf.args.usb_speed == "usb2") as device:
RuntimeError: Failed to find device after booting, error message: X_LINK_DEVICE_NOT_FOUND

This is the dmesg output from plugging in the USB cable until the demonstration application terminates with an error.

[2528003.223803] usb 3-2: new high-speed USB device number 38 using xhci_hcd
[2528003.434236] usb 3-2: New USB device found, idVendor=03e7, idProduct=2485, bcdDevice= 0.01
[2528003.434239] usb 3-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[2528003.434240] usb 3-2: Product: Movidius MyriadX
[2528003.434242] usb 3-2: Manufacturer: Movidius Ltd.
[2528003.434242] usb 3-2: SerialNumber: 03e72485
[2528007.911720] usb 3-2: USB disconnect, device number 38
[2528008.424018] usb 4-2: new SuperSpeed Gen 1 USB device number 19 using xhci_hcd
[2528008.448490] usb 4-2: New USB device found, idVendor=03e7, idProduct=f63b, bcdDevice= 1.00
[2528008.448494] usb 4-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[2528008.448496] usb 4-2: Product: Luxonis Device
[2528008.448497] usb 4-2: Manufacturer: Intel Corporation
[2528008.448498] usb 4-2: SerialNumber: 14442C1061F95ED700
[2528016.490494] usb 3-2: new high-speed USB device number 39 using xhci_hcd
[2528016.697634] usb 3-2: New USB device found, idVendor=03e7, idProduct=2485, bcdDevice= 0.01
[2528016.697637] usb 3-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[2528016.697638] usb 3-2: Product: Movidius MyriadX
[2528016.697639] usb 3-2: Manufacturer: Movidius Ltd.
[2528016.697640] usb 3-2: SerialNumber: 03e72485
[2528016.937216] usb 4-2: USB disconnect, device number 19

Running it again after this fails to even find the device in bootloader mode:

$ python3 depthai_demo.py
Using depthai module from:  /home/geoff/.local/lib/python3.8/site-packages/depthai.cpython-38-x86_64-linux-gnu.so
Depthai version installed:  2.8.0.0
Available devices:
[0] <error> [X_LINK_UNBOOTED]
Traceback (most recent call last):
  File "depthai_demo.py", line 132, in <module>
    with dai.Device(pm.p.getOpenVINOVersion(), device_info, usb2Mode=conf.args.usb_speed == "usb2") as device:
RuntimeError: Failed to find device (ma2480), error message: X_LINK_DEVICE_NOT_FOUND

There is no dmesg output while running the application this second time.

Reconnecting the USB cable and running the demonstration application again gives the first result above, where the device is at first found, then lost.

Launching the same application in the host OS works fine. The camera image and disparity map are displayed correctly until Ctrl-C is pressed.

$ python depthai_demo.py
Using depthai module from:  /home/geoff/.local/lib/python3.9/site-packages/depthai.cpython-39-x86_64-linux-gnu.so
Depthai version installed:  2.8.0.0
Available devices:
[0] 14442C1061F95ED700 [X_LINK_UNBOOTED]
^CTraceback (most recent call last):
  File "/home/geoff/src/depthai/depthai_demo.py", line 297, in <module>
    key = cv2.waitKey(1)
KeyboardInterrupt

This is the dmesg output for when the application runs correctly on the host OS, from connecting the USB cable through to pressing Ctrl-C:

[2527911.859755] usb 3-2: new high-speed USB device number 35 using xhci_hcd
[2527912.066882] usb 3-2: New USB device found, idVendor=03e7, idProduct=2485, bcdDevice= 0.01
[2527912.066885] usb 3-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[2527912.066887] usb 3-2: Product: Movidius MyriadX
[2527912.066888] usb 3-2: Manufacturer: Movidius Ltd.
[2527912.066888] usb 3-2: SerialNumber: 03e72485
[2527921.597178] usb 3-2: USB disconnect, device number 35
[2527922.106617] usb 4-2: new SuperSpeed Gen 1 USB device number 17 using xhci_hcd
[2527922.131113] usb 4-2: New USB device found, idVendor=03e7, idProduct=f63b, bcdDevice= 1.00
[2527922.131117] usb 4-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[2527922.131120] usb 4-2: Product: Luxonis Device
[2527922.131121] usb 4-2: Manufacturer: Intel Corporation
[2527922.131123] usb 4-2: SerialNumber: 14442C1061F95ED700
[2527926.566500] usb 3-2: new high-speed USB device number 36 using xhci_hcd
[2527926.773672] usb 3-2: New USB device found, idVendor=03e7, idProduct=2485, bcdDevice= 0.01
[2527926.773675] usb 3-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[2527926.773677] usb 3-2: Product: Movidius MyriadX
[2527926.773678] usb 3-2: Manufacturer: Movidius Ltd.
[2527926.773679] usb 3-2: SerialNumber: 03e72485
[2527926.939837] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.940000] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.940157] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.940313] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.940469] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.940624] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.940780] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.940940] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.941101] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.941261] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.941418] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.941574] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.941731] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.941888] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.942043] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.942200] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.942357] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.942577] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.942738] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.942895] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.943051] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.943222] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.943380] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.943537] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.943696] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.943853] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.944011] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.944167] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.944324] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.944482] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.944638] usb 4-2: usbfs: usb_submit_urb returned -19
[2527926.944795] usb 4-2: usbfs: usb_submit_urb returned -19
[2527927.006460] usb 4-2: USB disconnect, device number 17

To Reproduce

Steps to reproduce the behavior:

  1. Create an LXD container with Ubuntu 20.04
    lxc launch ubuntu:20.04 my_container
  2. Enter the container
    lxc exec my_container --user 1000 --group 1000 -- bash --login
  3. In the container, set up the udev rule for DepthAI
    echo 'SUBSYSTEM=="usb", ATTRS{idVendor}=="03e7", MODE="0666"' | sudo tee /etc/udev/rules.d/80-movidius.rules
    sudo udevadm control --reload-rules && sudo udevadm trigger
  4. Plug in the OAK-D
  5. On the host, add the USB device to the container:
    lxc config device add my_container oakd usb vendorid=03e7
  6. In the container, execute the depthai_demo.py demonstration application
  7. Observe that the application finds the OAK-D in its bootloader mode, but fails to find the device after it reboots.

Expected behavior

The demonstration application launches in an LXD container, opens the OAK-D device and displays the camera image and the disparity map.

Attach system log

Output of log_system_information.py:

{
    "architecture": "64bit ELF",
    "machine": "x86_64",
    "platform": "Linux-5.12.15-arch1-1-x86_64-with-glibc2.33",
    "processor": "",
    "python_build": "default Jun 30 2021 10:22:16",
    "python_compiler": "GCC 11.1.0",
    "python_implementation": "CPython",
    "python_version": "3.9.6",
    "release": "5.12.15-arch1-1",
    "system": "Linux",
    "version": "#1 SMP PREEMPT Wed, 07 Jul 2021 23:35:29 +0000",
    "win32_ver": "",
    "uname": "Linux atlas 5.12.15-arch1-1 #1 SMP PREEMPT Wed, 07 Jul 2021 23:35:29 +0000 x86_64",
    "packages": [
        "alabaster==0.7.12",
        "appdirs==1.4.4",
        "apsw==3.35.4.post1",
        "argcomplete==1.12.1",
        "asn1crypto==1.4.0",
        "astroid==2.4.2",
        "autopep8==1.5.5",
        "Babel==2.9.1",
        "backcall==0.2.0",
        "beautifulsoup4==4.9.3",
        "blobconverter==1.0.0",
        "boto3==1.18.18",
        "botocore==1.21.18",
        "breathe==4.30.0",
        "bs4==0.0.1",
        "btrfsutil==5.12.1",
        "CacheControl==0.12.6",
        "cchardet==2.1.7",
        "certifi==2020.12.5",
        "cffi==1.14.6",
        "chardet==3.0.4",
        "colorama==0.4.4",
        "contextlib2==0.6.0.post1",
        "cryptography==3.4.7",
        "css-parser==1.0.6",
        "cssselect==1.1.0",
        "decorator==5.0.9",
        "depthai==2.8.0.0",
        "distlib==0.3.2",
        "distro==1.5.0",
        "dnspython==1.16.0",
        "docopt==0.6.2",
        "docutils==0.17.1",
        "entrypoints==0.3",
        "exhale==0.2.3",
        "feedparser==5.2.1",
        "ffmpy3==0.2.4",
        "flake8==3.9.2",
        "greenlet==1.1.0",
        "html2text==2020.1.16",
        "html5-parser==0.4.9",
        "html5lib==1.1",
        "idna==2.10",
        "ifaddr==0.1.7",
        "imagesize==1.2.0",
        "importlib-metadata==4.6.0",
        "ipython==7.25.0",
        "ipython_genutils==0.2.0",
        "isc==2.0",
        "isort==5.7.0",
        "jedi==0.17.2",
        "jeepney==0.6.0",
        "Jinja2==3.0.1",
        "jmespath==0.10.0",
        "keyring==23.0.1",
        "keyutils==0.6",
        "lazy-object-proxy==1.4.3",
        "lensfun==0.3.95",
        "louis==3.18.0",
        "lxml==4.6.3",
        "Markdown==3.3.4",
        "MarkupSafe==2.0.1",
        "matplotlib-inline==0.1.2",
        "mccabe==0.6.1",
        "mechanize==0.4.5",
        "more-itertools==8.7.0",
        "msgpack==1.0.2",
        "mutagen==1.45.1",
        "mypy==0.800",
        "mypy-extensions==0.4.3",
        "netifaces==0.11.0",
        "netsnmp-python==1.0a1",
        "networkx==2.5.1",
        "nftables==0.1",
        "npyscreen==4.10.5",
        "numpy==1.20.3",
        "opencv-contrib-python==4.5.1.48",
        "opencv-python==4.5.1.48",
        "ordered-set==4.0.2",
        "packaging==20.9",
        "parso==0.7.1",
        "pcp==5.0",
        "pep517==0.10.0",
        "pexpect==4.8.0",
        "pickleshare==0.7.5",
        "Pillow==8.3.1",
        "pip==21.2.3",
        "pluggy==0.13.1",
        "ply==3.11",
        "progress==1.5",
        "prompt-toolkit==3.0.19",
        "psutil==5.8.0",
        "ptyprocess==0.7.0",
        "pwquality==1.4.4",
        "py7zr==0.11.3",
        "pychm==0.8.6",
        "pycodestyle==2.7.0",
        "pycparser==2.20",
        "pycryptodome==3.10.1",
        "pydocstyle==6.1.1",
        "pyflakes==2.3.1",
        "Pygments==2.9.0",
        "PyGObject==3.40.1",
        "pylama==7.7.1",
        "pylint==2.6.0",
        "pyls-isort==0.2.0",
        "pyls-mypy==0.1.8",
        "pynvim==0.4.3",
        "pyOpenSSL==20.0.1",
        "pyparsing==2.4.7",
        "PyQt5==5.15.4",
        "PyQt5-sip==12.9.0",
        "PyQtWebEngine==5.15.4",
        "pyserial==3.5",
        "python-dateutil==2.8.1",
        "python-jsonrpc-server==0.4.0",
        "python-language-server==0.36.2",
        "pytube==10.8.5",
        "pytz==2021.1",
        "pyusb==1.2.1",
        "PyYAML==5.4.1",
        "regex==2021.7.6",
        "requests==2.24.0",
        "requests-cache==0.5.2",
        "resolvelib==0.5.5",
        "retrying==1.3.3",
        "rope==0.18.0",
        "s3transfer==0.5.0",
        "scipy==1.7.0",
        "SecretStorage==3.3.1",
        "setuptools==57.1.0",
        "Shapely==1.7.1",
        "sip==4.19.25",
        "six==1.16.0",
        "slip==0.6.5",
        "slip.dbus==0.6.5",
        "snowballstemmer==2.1.0",
        "soupsieve==2.2.1",
        "Sphinx==4.1.0",
        "sphinx-multiversion==0.2.4",
        "sphinx-rtd-theme==0.5.2",
        "sphinx-tabs==3.0.0",
        "sphinxcontrib-applehelp==1.0.2",
        "sphinxcontrib-devhelp==1.0.2",
        "sphinxcontrib-htmlhelp==2.0.0",
        "sphinxcontrib-jsmath==1.0.1",
        "sphinxcontrib-qthelp==1.0.3",
        "sphinxcontrib-serializinghtml==1.1.5",
        "team==1.0",
        "texttable==1.6.3",
        "toml==0.10.2",
        "traitlets==5.0.5",
        "trimesh==3.9.20",
        "tvdb-api==3.1.0",
        "typed-ast==1.4.2",
        "typing-extensions==3.7.4.3",
        "udiskie==2.3.3",
        "ujson==4.0.2",
        "unrardll==0.1.4",
        "urllib3==1.25.11",
        "urwid==2.1.2",
        "vcstool==0.2.15",
        "wcwidth==0.2.5",
        "webencodings==0.5.1",
        "wrapt==1.12.1",
        "yapf==0.30.0",
        "youtube-dl==2021.6.6",
        "zeroconf==0.29.0",
        "zipp==3.5.0"
    ],
    "usb": [
        {
            "port": 0,
            "vendor_id": "0x1d6b",
            "product_id": "0x0003",
            "speed": "SuperPlus"
        },
        {
            "port": 2,
            "vendor_id": "0x03e7",
            "product_id": "0x2485",
            "speed": "High"
        },
        {
            "port": 0,
            "vendor_id": "0x1d6b",
            "product_id": "0x0002",
            "speed": "High"
        },
        {
            "port": 0,
            "vendor_id": "0x1d6b",
            "product_id": "0x0003",
            "speed": "Super"
        },
        {
            "port": 9,
            "vendor_id": "0x3297",
            "product_id": "0x1969",
            "speed": "Full"
        },
        {
            "port": 8,
            "vendor_id": "0x047d",
            "product_id": "0x2041",
            "speed": "Low"
        },
        {
            "port": 7,
            "vendor_id": "0x0764",
            "product_id": "0x0501",
            "speed": "Full"
        },
        {
            "port": 4,
            "vendor_id": "0x05e3",
            "product_id": "0x0715",
            "speed": "High"
        },
        {
            "port": 3,
            "vendor_id": "0x14cd",
            "product_id": "0x1212",
            "speed": "High"
        },
        {
            "port": 4,
            "vendor_id": "0x0bda",
            "product_id": "0x5411",
            "speed": "High"
        },
        {
            "port": 10,
            "vendor_id": "0x0451",
            "product_id": "0x8142",
            "speed": "High"
        },
        {
            "port": 0,
            "vendor_id": "0x1d6b",
            "product_id": "0x0002",
            "speed": "High"
        }
    ]
}
gbiggs commented 3 years ago

I forgot to mention that I have tried setting up something like the udev rules and script discussed in the installation instructions for Kernel VMs, but modified for LXD. It made no difference to the behaviour I'm getting.

Luxonis-Brandon commented 3 years ago

Sorry about the trouble @gbiggs . Unfortunately I don't personally know what LXD is (but I'll Google it shortly). What I'm thinking is happening though is that USB2 is being routed through to the container but USB3 is not. So this is why before boot you can see DepthAI, but not after.

This same thing will happen on Virtual Box for example if USB3 is not passed through. See here. And in Virtual Box case, it is necessary to actually run the pipeline to get the USB3 interface to show up (as it only shows up after DepthAI has booted over USB2) and then make sure to add it to pass through into Virtual Box.

So I'm wondering if a similar thing is necessary in LXD. But I'm guessing into the dark a little as I don't yet know what LXD is (will Google it shortly).

Thoughts?

Thanks, Brandon

Luxonis-Brandon commented 3 years ago

OK, learning a bit more about LXD. From my reading so far, I'm guessing our Docker container reference is probably what should be used to get LXD running, as it sounds like the libusb rework that was needed to work properly in Docker is likely also needed for LXD (since they seem to be very similar).

https://hub.docker.com/r/luxonis/depthai-library

And beyond that, I'll likely need the team to comment further.

Sorry again about the trouble.

Thanks, Brandon

gbiggs commented 3 years ago

I'm trying to mimic what the Dockerfile does in my LXD container, but I'm not sure where to install libusb to or what needs to build against it.

Luxonis-Brandon commented 3 years ago

Hi @gbiggs ,

Unfortunately I don't know it well enough - and I didn't do the work myself - but here is an overview of at least a bit of what was done to get Docker to work. I remember it being a confusing and fairly lengthy effort. And actually IIRC one of our customers is who actually figured it out and then kind of taught us the technique to do it.

That said, I think @themarpe may be able to provide some pointers. And also I think some recent work in libusb itself to make it so it's accessible in a non-priveleged manor (I think it requires root as of now) would help here. I think the main effort is to make it so libusb can be used in non-rooted ways in Android, but I think I remember a "hallway conversation" about how this would help make Docker (and thereby LXD) easier to make work.

https://github.com/libusb/libusb/pull/874

Thoughts?

Thanks, Brandon

gbiggs commented 3 years ago

I will keep playing around with libusb as I have time, but certainly some pointers would be helpful.

themarpe commented 3 years ago

@gbiggs I'd suggest you taking a look at this. https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/NCS2-in-docker-container-on-RPI-results-in-RuntimeError-Can-not/m-p/1191905/highlight/true#M19803

It boils down to 2 steps:

And a third one which stems from first point - linking with newly built libusb.

This can be achieved in two ways:

Hopefully this maps to LXD neatly and helps you resolve the issue. If you'll have any additional information, feel free to tag and ask me:)

oto313 commented 3 years ago

I had same problem.

https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/NCS2-in-docker-container-on-RPI-results-in-RuntimeError-Can-not/m-p/1191905/highlight/true#M19803

this solution solves it. Thanks

gbiggs commented 3 years ago

@oto313 Did you have the same problem in an LXD container or in a Docker container?

@themarpe I've tried compiling libusb without udev support, and replaced the system-installed library with the new version (because I couldn't find proof that DepthAI was really finding the new one in the CMake output or cache). When I run an example from depthai-core I still get the same result: the OAK-D clicks, dmesg on the host shows it restarting with the new product ID of its new mode, and ls -R /dev/bus/usb in the container shows the initial USB device disappearing and the new USB device appearing, with correct access permissions. Nevertheless the example still gives the following output, and the OAK-D restarts back into its boot mode:

$ ./rgb_video
terminate called after throwing an instance of 'std::runtime_error'
  what():  Failed to find device after booting, error message: X_LINK_DEVICE_NOT_FOUND
zsh: abort (core dumped)  ./rgb_video

This is the device in /dev/:

/dev/bus/usb:
total 0
drwxr-xr-x 2 root root 40 Aug 13 07:02 001
drwxr-xr-x 2 root root 40 Aug 20 09:24 003
drwxr-xr-x 2 root root 60 Aug 20 09:24 004

/dev/bus/usb/001:
total 0

/dev/bus/usb/003:
total 0

/dev/bus/usb/004:
total 0
crw-rw-rw- 1 root root 189, 431 Aug 20 09:24 048

And here's dmesg output showing that device 48 is correct, and the restart back into boot mode (the disconnecting device number 88 is the boot mode device just before starting the example):

[3233363.346779] usb 3-2.3: USB disconnect, device number 88
[3233363.641018] usb 4-2.3: new SuperSpeed Gen 1 USB device number 48 using xhci_hcd
[3233363.665603] usb 4-2.3: New USB device found, idVendor=03e7, idProduct=f63b, bcdDevice= 1.00
[3233363.665606] usb 4-2.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[3233363.665608] usb 4-2.3: Product: Luxonis Device
[3233363.665609] usb 4-2.3: Manufacturer: Intel Corporation
[3233363.665610] usb 4-2.3: SerialNumber: 14442C1061F95ED700
[3233371.460568] usb 4-2.3: USB disconnect, device number 48
[3233371.717366] usb 3-2.3: new high-speed USB device number 89 using xhci_hcd
[3233371.834706] usb 3-2.3: New USB device found, idVendor=03e7, idProduct=2485, bcdDevice= 0.01
[3233371.834710] usb 3-2.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[3233371.834712] usb 3-2.3: Product: Movidius MyriadX
[3233371.834713] usb 3-2.3: Manufacturer: Movidius Ltd.
[3233371.834713] usb 3-2.3: SerialNumber: 03e72485
oto313 commented 3 years ago

@gbiggs Hi, I had problem in Docker container - arm32v7 arch

Client: Context: omni Debug Mode: false Plugins: app: Docker App (Docker Inc., v0.9.1-beta3) buildx: Build with BuildKit (Docker Inc., v0.5.1-docker) scan: Docker Scan (Docker Inc., v0.6.0)

Server: Containers: 1 Running: 1 Paused: 0 Stopped: 0 Images: 17 Server Version: 19.03.15 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 3b3e9d5f62a114153829f9fbe2781d27b0a2ddac.m runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f-dirty init version: fec3683-dirty (expected: fec3683b971d9) Kernel Version: 5.5.19-a26c30ea-iota-devel Operating System: ------ OSType: linux Architecture: armv7l CPUs: 2 Total Memory: 3.708GiB Name: orion-305e-devel ID: APUF:7PEI:6SU3:ZE4D:Z3X3:36LO:34SN:WXF4:BC7Q:S22S:DORT:7YPA Docker Root Dir: /media/data/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

themarpe commented 3 years ago

@gbiggs All of the steps you've carried out above, these were run inside the LXD container?

Regarding libusb, if you linked to dynamic you can check using:

ldd depthai-core/examples/rgb_preview  | grep usb`
# outputs
libusb-1.0.so.0 => /lib/x86_64-linux-gnu/libusb-1.0.so.0 (0x00007f391c9b7000)
gbiggs commented 3 years ago

@themarpe Apart from dmesg, which can only be run in the host OS, all the other steps were carried out in the LXD container.

gbiggs commented 3 years ago

ldd depthai-core/examples/rgb_preview | grep usb shows that the example is linking to the correct version of libusb.

themarpe commented 3 years ago

@gbiggs not sure if you've tested this already, but can you try passing these settings. Seems similar as in Docker container use case, so I presume it could resolve your issue https://unix.stackexchange.com/a/137936

gbiggs commented 3 years ago

Thanks for the pointer to how to set a cgroup permission. I added it to my container instance using this command (LXD is slightly different from pure LXC):

$ lxc config set oakd_test raw.lxc="lxc.cgroup.devices.allow = c 189:* rwm"

I confirmed that the configuration has been saved:

$ lxc config show oakd_test
architecture: x86_64
config:
  image.architecture: amd64
  image.description: |
    Ubuntu focal
  image.name: ubuntu-x86_64
  image.os: ubuntu
  image.release: focal
  image.serial: "20210601_0155"
  image.variant: default
  raw.lxc: lxc.cgroup.devices.allow = c 189:* rwm
  volatile.base_image: 4b5ebb97d70ed3d121ae8cd1c8137f8b68db5980e14f5015223168a90b90a5a8
  volatile.eth0.host_name: vethba8797e8
  volatile.eth0.hwaddr: 00:16:3e:08:35:1d
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":1000},{"Isuid":true,"Isgid":false,"Hostid":1000,"Nsid":1000,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":101001,"Nsid":1001,"Maprange":64535},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":1000},{"Isuid":false,"Isgid":true,"Hostid":1000,"Nsid":1000,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":101001,"Nsid":1001,"Maprange":64535}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":1000},{"Isuid":true,"Isgid":false,"Hostid":1000,"Nsid":1000,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":101001,"Nsid":1001,"Maprange":64535},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":1000},{"Isuid":false,"Isgid":true,"Hostid":1000,"Nsid":1000,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":101001,"Nsid":1001,"Maprange":64535}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":1000},{"Isuid":true,"Isgid":false,"Hostid":1000,"Nsid":1000,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":101001,"Nsid":1001,"Maprange":64535},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":1000},{"Isuid":false,"Isgid":true,"Hostid":1000,"Nsid":1000,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":101001,"Nsid":1001,"Maprange":64535}]'
  volatile.last_state.power: RUNNING
  volatile.uuid: 7073e7ef-3c8c-4561-8020-8e8327257ee2
devices:
  oakd:
    type: usb
    vendorid: "03e7"
  realsense:
    type: usb
    vendorid: "8086"
ephemeral: false
profiles:
- default
- container_marker
- gpu
- home_directory
- pulseaudio
- sshagent
- x11
stateful: false
description: ""

I also restarted the instance, to be sure it took effect.

Running the rgv_video example or any other example still gives the same output as above, and the nodes in the device tree are identical as well. Possibly the lxc config command to add a USB device to an instance is adding the cgroup permissions under the hood already.

themarpe commented 3 years ago

@gbiggs One extra thing worth checking out: Use environment variable LIBUSB_DEBUG=4 inside the container when running a depthai application to get more debug information of libusb. Check if the following message is printed out:

libusb: debug [linux_netlink_read_message] ignoring netlink message with non-zero sender UID 65534

If so, there is a file libusb/os/linux_netlink.c with the following contents:

cred = (struct ucred *)CMSG_DATA(cmsg);
if (cred->uid != 0) 
{
        usbi_dbg("ignoring netlink message with non-zero sender UID %u", (unsigned int)cred->uid);
        return -1;
}

Remove the if statement check, recompile the libusb and depthai and retest.

If the above is not apparent, feel free to skim through libusb debug logs and try spotting anything unordinary. You may also diff compare it with a disabled udev libusb on host (otherwise messages won't be the same most likely)

gbiggs commented 3 years ago

That line doesn't appear in the debug output. I'll read through the log and see if I can find anything useful, and also try a diff.

Here's the debug output, in case you know what to look for. The last few lines are repeated forever until the application exits.

gbiggs commented 3 years ago

I've found a significant difference between the two logs. When running on the host, with a no-udev-libusb, the following line appears about 1.8s into the log:

[ 1.835360] [0003cd7b] libusb: debug [linux_netlink_read_message] netlink hotplug found device busnum: 4, devaddr: 17, sys_name: 4-2, removed: no

That device number is the OAK-D after rebooting into running-mode on the USB 3 bus. Following that line are many attempts to open the device, which fail with no access until (I assume) udev runs the rule that gives it access, then the device being opened, transfers happening, etc. The above line never happens inside the container. I think this means that the container is not properly announcing the new USB device being attached.

I've uploaded the log for the host so you can what I mean.

gbiggs commented 3 years ago

According to this article:

LXD USB devices support hotplug by default. So unplugging the device and plugging it back on the host will have it removed and re-added to the container.

gbiggs commented 3 years ago

Knowing a big more what to look for now, I've found that in the container, the following events are detected:

The removal of the boot-mode device when it reboots:

[ 1.445999] [0000164e] libusb: debug [linux_netlink_read_message] netlink hotplug found device busnum: 3, devaddr: 30, sys_name: 3-2, removed: yes

The appearance of the boot-mode device after the running-mode device gives up and reboots:

[10.156069] [0000164e] libusb: debug [linux_netlink_read_message] netlink hotplug found device busnum: 3, devaddr: 31, sys_name: 3-2, removed: no

The removal of the running-mode device when it gives up and reboots (note that this happens after the above line):

[10.467585] [0000164e] libusb: debug [linux_netlink_read_message] netlink hotplug found device busnum: 4, devaddr: 19, sys_name: 4-2, removed: yes

Notably absent from the debug output in the container is the event for the running-mode device appearing, even though the device node in /dev/bus/usb is created properly, has the correct permissions, etc., and even though it gets an event for the device's removal from the USB bus.

Running on the host, the debug output does contain an event for the running-mode device appearance.

themarpe commented 3 years ago

@gbiggs can you try forcing USB2 mode in the script you are using? Its odd that only that message is missing. Maybe LXD has any options for that case? Also I saw that you have options for passing USB devices with vendor ID set to 0x03e7. I presume this goes for all devices, but note that the ROM Bootloader has product ID 0x2485 and running FW 0xf63b, just in case this comes up somewhere

gbiggs commented 3 years ago

can you try forcing USB2 mode in the script you are using?

I edited rgb_video.cpp to force USB2 mode and ran it, but it gives the same behaviour, except that the running-mode device shows up on the USB2 bus now instead of the USB3 bus (confirmed in dmesg).

Its odd that only that message is missing. Maybe LXD has any options for that case?

I agree, it's very odd that it consistently misses just that message. I haven't found any further information for LXD hotplugging other than "it just works"; which it does seem to mostly. It's just that one event...

Also I saw that you have options for passing USB devices with vendor ID set to 0x03e7. I presume this goes for all devices, but note that the ROM Bootloader has product ID 0x2485 and running FW 0xf63b, just in case this comes up somewhere

Adding a device with a vendor ID only makes the product ID a wildcard match, apparently. Just to be sure, though, I added devices for both product IDs specifically. It made no difference.

Sanjubisanal commented 2 years ago

@gbiggs is your issue resolved with the above fix ? I am also facing same issue as yours with intel movidius stick on LXD.

ParasPidurkar commented 2 years ago

@gbiggs Please provide us details regarding the resolution of the issue .

themarpe commented 2 years ago

@Sanjubisanal @ParasPidurkar You have the steps to reproduce here: https://github.com/lxc/lxd/issues/9136 It also seems that the issue was addressed, so it might work now. Give that a try and let us know how it goes.

Sanjubisanal commented 2 years ago

@themarpe We still have the same issue. On host:

root@raspberrypi4:/home/root/face_match# ./run-facedetect
Found stale device, resetting
Device 0 Address: 1.1 - VID/PID 03e7:2150
Device 1 Address: 1 - VID/PID 03e7:f63b
device attached
1.1
Starting wait for connect with 2000ms timeout
Found Address: 1.1 - VID/PID 03e7:2150
Found EP 0x81 : max packet size is 512 bytes
Found EP 0x01 : max packet size is 512 bytes
Found and opened device
Performing bulk write of 865724 bytes...
Successfully sent 865724 bytes of data in 128.084278 ms (6.445902 MB/s)
Boot successful, device address 1.1
Device 0 Address: 1 - VID/PID 03e7:f63b
Found Address: 1 - VID/PID 03e7:f63b
done
Booted 1 -> VSC

Inside LXD:

Device 0 Address: 1.1 - VID/PID 03e7:2150
device attached
1.1
Starting wait for connect with 2000ms timeout
Found Address: 1.1 - VID/PID 03e7:2150
Found EP 0x81 : max packet size is 512 bytes
Found EP 0x01 : max packet size is 512 bytes
Found and opened device
Performing bulk write of 865724 bytes...
Successfully sent 865724 bytes of data in 132.178011 ms (6.246264 MB/s)
Boot successful, device address 1.1

Traceback (most recent call last):
  File "/usr/bin/facenet-pipe.py", line 269, in <module>
    sys.exit(main())
  File "/usr/bin/facenet-pipe.py", line 204, in main
    device.OpenDevice()
  File "/usr/bin/mvnc/mvncsapi.py", line 149, in OpenDevice
    raise Exception(Status(status))
Exception: mvncStatus.ERROR
themarpe commented 2 years ago

@Sanjubisanal Highly likely yes - do try that and make sure new library is taken instead of system libusb (check using ldd) If you are using Python, make sure to recompile locally so new libusb is taken into account. See the Dockerfile for reference: https://github.com/luxonis/depthai-python/blob/develop/ci/Dockerfile