luxonis / depthai-unity

DepthAI Unity Library, Unity projects and examples (OAK For Unity)
MIT License
200 stars 34 forks source link

Editor crashes when stopping a running scene on Linux #9

Closed sliwowitz closed 1 year ago

sliwowitz commented 2 years ago

When I run any of the example scenes - the Playground main scene or any of the sub-scenes in ..., the Editor dies after I click the play button the second time to stop the scene.

Device: OAK-D-Lite Unity: 2021.2.7f1

OS: Ubuntu 21.10 x86_64 Kernel: 5.13.0-27-lowlatency CPU: Intel i7-7820X @3.60 GHz, 8 physical cores GPU: NVIDIA TITAN RTX, driver version 470.86 Memory: 64 GB

The setup is the same as described in issue #8, only now what I do is:

I tried to run in gdb again

gdb /opt/unity/2021.2.7f1/Editor/Unity

Only this time there were more bogus signals trapped when I didn't want to, so to set up gdb:

set mi-async 1
set pagination off
set non-stop on
handle SIGXCPU SIGPWR SIG36 SIG37 nostop noprint
run -projectPath ~/hax/depthai-unity/OAKForUnity/URP/

Now I open the scene in the editor, push the play button, wait for the scene to fully load, go to gdb again and hit Ctrl+C

handle SIGXCPU stop print
continue

Now I hit the play button again to stop the running scene. The Editor dies, and I get a stack trace (actually about 30 killed threads, only number 1 seems to be relevant):

Thread 1 "Unity" received signal SIGXCPU, CPU time limit exceeded.
backward::SignalHandling::sig_handler (signo=0, info=0x3db61d19, _ctx=0xbf7efc623c114bb3) at /home/strazce/.hunter/_Base/062a19a/e4f2f5b/72dac15/Install/include/backward.hpp:4263
4263      sig_handler(int signo, siginfo_t *info, void *_ctx) {
(gdb) thread 1
[Switching to thread 1 (Thread 0x7fffed3ad340 (LWP 167877))]
#0  backward::SignalHandling::sig_handler (signo=0, info=0x3db61d19, _ctx=0xbf7efc623c114bb3) at /home/strazce/.hunter/_Base/062a19a/e4f2f5b/72dac15/Install/include/backward.hpp:4263
4263      sig_handler(int signo, siginfo_t *info, void *_ctx) {
(gdb) bt
#0  backward::SignalHandling::sig_handler(int, siginfo_t*, void*) (signo=0, info=0x3db61d19, _ctx=0xbf7efc623c114bb3) at /home/strazce/.hunter/_Base/062a19a/e4f2f5b/72dac15/Install/include/backward.hpp:4263
#1  0x00007ffff0207520 in <signal handler called> () at /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff020780a in __GI___sigsuspend (set=0x7ffedc5e11a0) at ../sysdeps/unix/sysv/linux/sigsuspend.c:26
#3  0x00007ffedc2a47ac in GC_suspend_handler_inner () at /opt/unity/2021.2.7f1/Editor/Data/MonoBleedingEdge/MonoEmbedRuntime/libmonobdwgc-2.0.so
#4  0x00007ffedc2a4665 in GC_suspend_handler () at /opt/unity/2021.2.7f1/Editor/Data/MonoBleedingEdge/MonoEmbedRuntime/libmonobdwgc-2.0.so
#5  0x00007ffff0207520 in <signal handler called> () at /lib/x86_64-linux-gnu/libc.so.6
#6  __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x7fffffffc0f0, op=393, expected=0, futex_word=0x7ffceff36940 <schedulerState+24832>) at futex-internal.c:57
#7  __futex_abstimed_wait_common (cancel=true, private=<optimized out>, abstime=0x7fffffffc0f0, clockid=0, expected=0, futex_word=0x7ffceff36940 <schedulerState+24832>) at futex-internal.c:87
#8  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7ffceff36940 <schedulerState+24832>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x7fffffffc0f0, private=<optimized out>) at futex-internal.c:139
#9  0x00007ffff02617f0 in do_futex_wait (sem=sem@entry=0x7ffceff36940 <schedulerState+24832>, abstime=abstime@entry=0x7fffffffc0f0, clockid=0) at sem_waitcommon.c:112
#10 0x00007ffff0261893 in __new_sem_wait_slow64 (sem=0x7ffceff36940 <schedulerState+24832>, abstime=0x7fffffffc0f0, clockid=0) at sem_waitcommon.c:184
#11 0x00007ffcef4be377 in XLink_sem_timedwait () at /home/strazce/hax/depthai-unity/build/depthai-core/libdepthai-core.so
#12 0x00007ffcef4bb203 in DispatcherWaitEventCompleteTimeout () at /home/strazce/hax/depthai-unity/build/depthai-core/libdepthai-core.so
#13 0x00007ffcef4b8dff in XLinkResetRemoteTimeout () at /home/strazce/hax/depthai-unity/build/depthai-core/libdepthai-core.so
#14 0x00007ffcef4a8faa in dai::XLinkConnection::close() (this=0x55556730b6e0) at /home/strazce/hax/depthai-unity/depthai-core/src/xlink/XLinkConnection.cpp:186
#15 0x00007ffcef2fbb21 in dai::DeviceBase::closeImpl() (this=0x55555a883480) at /home/strazce/hax/depthai-unity/depthai-core/src/device/DeviceBase.cpp:369
#16 0x00007ffcef2e8508 in dai::Device::closeImpl() (this=0x55555a883480) at /home/strazce/hax/depthai-unity/depthai-core/src/device/Device.cpp:82
#17 0x00007ffcef2fbacd in dai::DeviceBase::close() (this=0x55555a883480) at /home/strazce/hax/depthai-unity/depthai-core/src/device/DeviceBase.cpp:359
#18 0x00007ffdaab221c9 in DAICloseDevice(int) (deviceNum=0) at /home/strazce/hax/depthai-unity/src/device/DeviceManager.cpp:207
#19 0x0000000040762fb3 in  ()
#20 0x00007fffffffca00 in  ()
#21 0x000055555e72d010 in  ()
#22 0x00007fffffffc8b0 in  ()
#23 0x00007ffda9e2aa80 in  ()
#24 0x00007ffda9e2aa80 in  ()
#25 0x000055555caa4638 in  ()
#26 0x0000000040762f11 in  ()
#27 0x000055555b6b4f10 in  ()
#28 0x00007fffffffc6c0 in  ()
#29 0x00007fffffffc5f0 in  ()
#30 0x0000000000000000 in  ()
gespona commented 2 years ago

Hi ! this last week I had not much time to dig into the issue, but I've doing some tests. It's bit strange because the stacktrace points to close device. Would you mind to comment lines 289, 299 and 309 - DAICloseDevice((int) device.deviceNum); of file URP\Assets\Plugins\OAKForUnity\Scripts\Predefined\PredefineBase.cs and check stacktrace?. I need to check more in deep but it's bit strange as still facing crash when comment the close device.

themarpe commented 2 years ago

@gespona try recreating - we got some fixes addressing underlying XLink in develop - might be worth checking against that

gespona commented 2 years ago

Thanks @themarpe ! Will give a try for sure and give feedback here .. during my tests I was noticing some unstability when updated to latest main ... so makes sense and I was about to check with you in the core part.

gespona commented 2 years ago

Hi ! @themarpe reading again your comment ... just to clarify with context from discord .. I'm able to reproduce the issue all the time. Last iteration on the unity plugin is focused first on Windows (and it's working fine there and on Mac) I was checking if something related to last improvements on Unity side could lead to this crash. So this morning I went back to latest version tested on Linux (based on depthai-core 18th Sep) and everything works fine (release v2.10.0) and unity editor is not crashing when stop the scene. Probably this is not good enough workaround for the moment for @sliwowitz as this version was not supporting OAK-D-Lite. I tested against develop but still crashing, So basically current unity plugin works fine on Linux if you checkout depthai-core against 57bb84a .. I mean Unity is not crashing anymore when stop the scene. Tested with OAK-D. so will ping you to check together, ok?

themarpe commented 2 years ago

Sounds good - thanks!

sliwowitz commented 2 years ago

Would you mind to comment lines 289, 299 and 309 - DAICloseDevice((int) device.deviceNum); of file URP\Assets\Plugins\OAKForUnity\Scripts\Predefined\PredefineBase.cs and check stacktrace?. I need to check more in deep but it's bit strange as still facing crash when comment the close device.

Yep. Still crashing even with these lines commented out. As you mention in the other comment, older depthai-core does not have this problem? I'll try to look into it next week, though I have to try with the new versions as I only have the OAK-D-Lite

gespona commented 2 years ago

Would you mind to comment lines 289, 299 and 309 - DAICloseDevice((int) device.deviceNum); of file URP\Assets\Plugins\OAKForUnity\Scripts\Predefined\PredefineBase.cs and check stacktrace?. I need to check more in deep but it's bit strange as still facing crash when comment the close device.

Yep. Still crashing even with these lines commented out. As you mention in the other comment, older depthai-core does not have this problem? I'll try to look into it next week, though I have to try with the new versions as I only have the OAK-D-Lite

Any chance to get the stacktrace? :) I'm working on few suggestions from @themarpe but no luck for now. I tested with OAK-D and depthai-core 2.10.0 and everything works fine but it's version previous OAK-D Lite support so not good enough workaround for you. Will try to speed up on the issue.

sliwowitz commented 2 years ago

I'm not at my main machine until Tuesday. I'll get the stacktrace when I'm back.

gespona commented 2 years ago

I'm not at my main machine until Tuesday. I'll get the stacktrace when I'm back.

Thanks a lot @sliwowitz !

sliwowitz commented 2 years ago

OK. Here's the stack trace oak-d-no-device-close.log, but this time I have no idea what's happening or even where. At the beginning of the file, you can see that the main thread didn't die this time, but was running when the editor crashed. When I poked in, it was sleeping inside some GC cycle. Most of the other threads seem to be:

  1. Burst/JIT related which have died either waiting for a pthread futex, or maybe also in a GC cycle - I don't know anything about C# internals, just guessing from the function names.
  2. Unity threads dying on SystemNative_Read

This time, the exception seems to be the last two threads, "Unity" number 156 starting at line 751, and "Loading.Preload" number 158 starting at line 771.

You won't find any reference to anything depthai in any of the dead threads. If I interrupt all threads and poke inside oak-d-no-device-close-all-threads.log, the three threads referencing depthai - no. 150 (l.67), no. 149 (l. 81), and no. 148 (l. 95) all seem to be peacefully waiting or polling.

gespona commented 2 years ago

@sliwowitz Thanks a lot for taking the time and send the info. Yes it's being a tricky/difficult issue.

About this case without closing device, yes it's really strange because initially though the crash was directly related to execute close device, but then I was surprised to see the crash commenting the lines.

So with @themarpe we're investigating other potential issues related to smart pointers, asan, ... as core v2.10 with OAK-D is working fine.

Thanks for your patience on this issue.

sliwowitz commented 1 year ago

I gave this a try again today, and cannot reproduce the issue anymore with neither OAK-D Lite nor my new OAK-D Pro :-). I'm on Ubuntu 22.10 with Unity 2022.2.1f1 or 2012.3.16f1 now. My guess would be it was fixed by one of recent core version bumps.

Not even clicking the unimplemented features (face detection, body pose, etc.) in a running scene would crash the Editor (it crashes the build though). Overall it feels much more robust now - as robust as the Editor gets on Linux.

Mr-ider commented 1 year ago

How can I change the prewiew resolution of the face detector scene? I change it to 600*300,but the face detector function doesn't work.

gespona commented 1 year ago

Hi @sliwowitz !, I was doing my own round of tests on Linux and I can confirm that I was not able to reproduce the issue anymore. Also I agree with you, the "fix" comes with recent core version bumps ;) Closing the ticket ;)