luxonis / depthai-core

DepthAI C++ Library
MIT License
220 stars 120 forks source link

[BUG] RGB&detection Queue hang after several minutes #606

Open Obot1234 opened 1 year ago

Obot1234 commented 1 year ago

Describe the bug I'm trying to run both visual odometry at 30fps (essentially half of the pipeline taken from this example and a neural network at 5fps (using yolov5 in the pipeline from this example) on the oak-d at the same time. Running visual odometry separately works, and running the neural net separately works. However, but when I run both the two output queues for the detection network at some point stop sending out messages. This happens after somewhere between 5 seconds and 10 minutes. All the 5 queues I'm reading out are set to be non-blocking. This happens on both the OAK-D and the OAK-D-POE.

Minimal Reproducible Example You can download the MRE here. The neural network is a blobified yolov5s network. I have run it with the latest version of depthai (2.17.4.0)

Expected behavior That the pipeline does not stop sending messages :) Instead, the output of the MRE will at some point (so between 5 seconds and 10 minutes) become:

1665391193.3609824
1665391194.361313
Message not gotten:  got_rgb:
Message not gotten:  got_detections:
1665391195.3618865
Message not gotten:  got_rgb:
Message not gotten:  got_detections:

Pipeline Graph Here is a visual overview of the full pipeline (or essentially two pipelines in parallel): afbeelding

Attach system log

{
    "architecture": "64bit WindowsPE",
    "machine": "AMD64",
    "platform": "Windows-10-10.0.19041-SP0",
    "processor": "Intel64 Family 6 Model 142 Stepping 12, GenuineIntel",
    "python_build": "tags/v3.9.4:1f2e308 Apr  6 2021 13:40:21",
    "python_compiler": "MSC v.1928 64 bit (AMD64)",
    "python_implementation": "CPython",
    "python_version": "3.9.4",
    "release": "10",
    "system": "Windows",
    "version": "10.0.19041",
    "win32_ver": "10 10.0.19041 SP0 Multiprocessor Free",
    "packages": [
        "absl-py==0.13.0",
        "addict==2.4.0",
        "aenum==2.2.6",
        "aiohttp==3.8.1",
        "aiosignal==1.2.0",
        "argcomplete==1.12.3",
        "astunparse==1.6.3",
        "async-timeout==4.0.2",
        "async-tkinter-loop==0.1.0",
        "asyncio==3.4.3",
        "attrs==21.2.0",
        "autoclass==2.2.0",
        "autoslot==2021.4.2",
        "blobconverter==1.2.6",
        "boto3==1.19.4",
        "botocore==1.22.4",
        "cachetools==4.2.2",
        "certifi==2020.12.5",
        "chardet==4.0.0",
        "charset-normalizer==2.0.9",
        "cycler==0.10.0",
        "Cython==0.29.21",
        "decopatch==1.4.8",
        "decorator==4.4.2",
        "defusedxml==0.7.1",
        "Deprecated==1.2.13",
        "depthai==2.17.4.0",
        "depthai-pipeline-graph @ git+https://github.com/geaxgx/depthai_pipeline_graph.git@b5f89bb06b5c421574d991bbdb31f1ebd42118e6",
        "distlib==0.3.4",
        "editdistance==0.6.0",
        "fast_ctc_decode==0.3.0",
        "filelock==3.4.0",
        "flatbuffers==2.0",
        "frozenlist==1.2.0",
        "future==0.18.2",
        "gast==0.5.3",
        "gitdb==4.0.7",
        "GitPython==3.1.12",
        "google-auth==1.33.0",
        "google-auth-oauthlib==0.4.4",
        "google-pasta==0.2.0",
        "gpxpy==1.5.0",
        "guppy3==3.1.0",
        "h5py==3.6.0",
        "hyperopt==0.1.2",
        "idna==2.10",
        "imageio==2.9.0",
        "Jinja2==2.11.3",
        "jmespath==0.10.0",
        "joblib==1.1.0",
        "jstyleson==0.0.2",
        "k4a==1.1.0",
        "keras==2.8.0",
        "Keras-Preprocessing==1.1.2",
        "kiwisolver==1.3.1",
        "libclang==13.0.0",
        "llvmlite==0.36.0",
        "makefun==1.11.3",
        "Markdown==3.3.4",
        "MarkupSafe==1.1.1",
        "matplotlib==3.2.2",
        "msgpack==1.0.2",
        "multidict==5.2.0",
        "netifaces==0.10.6",
        "networkx==2.5.1",
        "nibabel==3.2.1",
        "nltk==3.6.5",
        "numba==0.53.1",
        "numpy==1.19.5",
        "oauthlib==3.1.1",
        "onnx==1.12.0",
        "onnx-simplifier==0.3.6",
        "onnxoptimizer==0.2.6",
        "onnxruntime==1.9.0",
        "opencv-contrib-python==4.5.5.62",
        "openvino==2021.4.2",
        "openvino-dev==2021.4.2",
        "opt-einsum==3.3.0",
        "packaging==21.3",
        "pandas==1.1.5",
        "parasail==1.2.4",
        "pascal-voc-writer==0.1.4",
        "perfcounters==2.1.0",
        "piexif==1.1.3",
        "Pillow==8.2.0",
        "pip==21.3.1",
        "progress==1.6",
        "py-cpuinfo==8.0.0",
        "pyads==3.3.9",
        "pyasn1==0.4.8",
        "pyasn1-modules==0.2.8",
        "pydicom==2.2.2",
        "pymba==0.3.7",
        "pymongo==4.0.1",
        "pyparsing==2.4.7",
        "pyserial==3.5",
        "PySide2==5.15.2.1",
        "python-can==3.3.4",
        "python-dateutil==2.8.1",
        "python-engineio==4.3.0",
        "python-json-logger==0.1.11",
        "python-socketio==5.5.0",
        "python-tsp==0.2.1",
        "pytz==2021.1",
        "PyWavelets==1.1.1",
        "PyYAML==6.0",
        "Qt.py==1.3.7",
        "rawpy==0.17.0",
        "ray==1.8.0",
        "redis==4.0.0",
        "regex==2021.11.10",
        "requests==2.25.1",
        "requests-oauthlib==1.3.0",
        "rsa==4.7.2",
        "s3transfer==0.5.0",
        "scikit-image==0.18.1",
        "scikit-learn==1.0.1",
        "scipy==1.5.4",
        "seaborn==0.11.1",
        "sentencepiece==0.1.96",
        "setuptools==60.5.0",
        "shiboken2==5.15.2.1",
        "six==1.15.0",
        "smmap==4.0.0",
        "tabulate==0.8.9",
        "tensorboard==2.8.0",
        "tensorboard-data-server==0.6.1",
        "tensorboard-plugin-wit==1.8.0",
        "tensorflow==2.8.0",
        "tensorflow-io-gcs-filesystem==0.24.0",
        "termcolor==1.1.0",
        "texttable==1.6.4",
        "tf-estimator-nightly==2.8.0.dev2021122109",
        "thop==0.0.31.post2005241907",
        "threadpoolctl==3.0.0",
        "tifffile==2021.4.8",
        "tokenizers==0.10.3",
        "torch==1.8.1+cpu",
        "torchaudio==0.8.1",
        "torchvision==0.9.1+cpu",
        "tqdm==4.60.0",
        "tsplib95==0.7.1",
        "typing-extensions==3.7.4.3",
        "urllib3==1.26.4",
        "wheel==0.36.2",
        "windows-curses==2.2.0",
        "wrapt==1.12.1",
        "yamlloader==1.1.0",
        "yarl==1.7.2"
    ],
    "usb": [
        "NoLib"
    ],
    "uname": [
        "Windows DESKTOP-U4L9RKK 10 10.0.19041 AMD64 Intel64 Family 6 Model 142 Stepping 12, GenuineIntel"
    ]
}
UsbSpeed.SUPER
moratom commented 1 year ago

Thanks @Obot1234 for the reported issue! I can reproduce it and it seems that the NN execution (inside the DetectionNetwork) halts after a while. I will transfer the bug report to our firmware repo and will let to know when there are any news.

In the mean time, do you mind testing if the issue persists if you compile the network for less shaves (1 for example) and report back?

Obot1234 commented 1 year ago

Compiling for 1 shave does indeed seem to solve the problem. I compiled it for 1 shave and I let it run twice for 20 minutes without a hitch. In both cases I stopped the MRE not because the bug showed up but because I had to run something else on the oakd. If you want, I can do a longer test coming Friday.

moratom commented 1 year ago

Thanks! Hopefully the issue doesn't happen with one shave, but if you have the time, do test it for longer or attach the model compiled for one shave here and I can run it, so we check if the bug doesn't happen at all or just less often.

Obot1234 commented 1 year ago

Ah of course, you can download it here.

Obot1234 commented 1 year ago

Hey, just checking up: is there any news on this issue?

moratom commented 1 year ago

Does the bug appear also with the single-shave blob?

Obot1234 commented 1 year ago

no, with the single-shave blob I have not seen the issue appear. I haven't done any exact tests, but it has been running for multiple hours on many occasions without a single failure. However, I do notice that the detection network is running slower than I'd like.

Obot1234 commented 7 months ago

Hello, I just wanted to check in again whether this issue has had any followup.

wouterio commented 3 months ago

Is there a chance you'll have a look at this in the near future?

Erol444 commented 1 month ago

Is this still present using the latest depthai version (2.26)?

Obot1234 commented 1 month ago

Yes, I can confirm it's still present. It took about half an hour for the detection stream to disappear.