nvidia-holoscan / holoscan-sdk

The AI sensor processing SDK for low latency streaming workflows
Apache License 2.0
107 stars 31 forks source link

AGX Orin - Segmentation fault on v4l2 USB input device when CTRL+C, SIGINT, and app->executor().interrupt() #25

Open ducta1092 opened 3 months ago

ducta1092 commented 3 months ago

Device: AGX Orin 64GB igpu OS: Jetpack 6.0 (setup by Nvidia SDK Manager) Holoscan: 1.0.3, 2.0.0 Input devices: USB Camera or Capture device through USB

When I run my application using v4l2_video_capture operator on AGX Orin. And send SIGINT (i.e CTRL+C) or use app->executor().interrupt(). The segmentation happens:

[info] [context.cpp:50] VK_KHR_external_semaphore_fd
[info] [context.cpp:50] VK_KHR_push_descriptor
[info] [context.cpp:50] VK_EXT_line_rasterization
[info] [context.cpp:50] 
[info] [vulkan_app.cpp:843] Using device 0: NVIDIA Tegra Orin (nvgpu) (UUID 2d5936874a5cd186832d226d29da3)
[warning] [cuda_stream_handler.hpp:328] Parameter `cuda_stream_pool` is not set, using the default CUDA stream for CUDA operations.
[info] [holoviz.cpp:1377] Input spec:
- type: color
  name: ""
  opacity: 1.000000
  priority: 0
^C[info] [greedy_scheduler.cpp:390] Stopping scheduler.
[ubuntu:28122:0:28123] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0xffffb0d542d0)
==== backtrace (tid:  28123) ====
0  /opt/nvidia/holoscan/examples/v4l2_camera/cpp/../../../lib/libucs.so.0(ucs_handle_error+0x2cc) [0xffffb1f7ba9c]
1  /opt/nvidia/holoscan/examples/v4l2_camera/cpp/../../../lib/libucs.so.0(+0x2bc4c) [0xffffb1f7bc4c]
2  /opt/nvidia/holoscan/examples/v4l2_camera/cpp/../../../lib/libucs.so.0(+0x2bffc) [0xffffb1f7bffc]
3  linux-vdso.so.1(__kernel_rt_sigreturn+0) [0xffffb75b87bc]
4  [0xffffb0d542d0]
=================================
Segmentation fault

I try to build with ucs 1.16.0 but not success. This is not unexpected behavior, because stop steps for nodes of graph not executed, and cannot do threading to trigger holoscan application. To reproduce, the simplest example in directory /opt/nvidia/holoscan/v4l2_camera has same crash when CTRL+C or send SIGINT. crash

This bug not happen on dGPU docker, I tested with nvcr.io/nvidia/clara-holoscan/holoscan:v1.0.3-dgpu

agirault commented 3 months ago

Hi @ducta1092. Thank you. This is consistent with the know issue listed in our release notes (4210082). We don't have an ETA for this bug fix at this stage, since it only happens on exit and is lower priority.

To circumvent the issue manually, you can force your application to not use the tegra nvv4l2 library, but use the default ubuntu package instead of v4l2. Three options are:

  1. (container only) comment out the v4l2 libs in /etc/nvidia-container-runtime/host-files-for-container.d/drivers.csv before running your container
  2. (baremetal or container) preload the standard v4l2 lib: LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libv4l2.so.0.0.0 <your_holoscan_app>
  3. (baremetal or container) switch the default v4l2 library on your system
    export v4l2_lib="/usr/lib/aarch64-linux-gnu/libv4l2.so"
    sudo mv ${v4l2_lib}.0 ${v4l2_lib}.0.old
    sudo ln -s ${v4l2_lib}.0.0.0 ${v4l2_lib}.0

Let us know if this helps. Thank you

ducta1092 commented 3 months ago

Many thanks, it works. But I think that should document this bug because the application may do some post actions (statistics, save video file...).