SIGSEGV when running advanced_demo.py on the unix port

amirgon commented 3 years ago

Example for the problem: https://github.com/lvgl/lv_binding_micropython/runs/3017175341

Backtrace:
lib/lv_bindings/tests/../../../ports/unix/micropython-dev(gc_lock+0xd)[0x55b15488b90d]
lib/lv_bindings/tests/../../../ports/unix/micropython-dev(+0x34610a)[0x55b154b4810a]
/lib/x86_64-linux-gnu/libffi.so.7(+0x6e06)[0x7f858831ae06]
/lib/x86_64-linux-gnu/libffi.so.7(+0x7188)[0x7f858831b188]
/lib/x86_64-linux-gnu/libc.so.6(+0x46210)[0x7f858838b210]
/lib/x86_64-linux-gnu/libc.so.6(clock_nanosleep+0xdf)[0x7f85884253bf]
/lib/x86_64-linux-gnu/libc.so.6(nanosleep+0x17)[0x7f858842b047]
/lib/x86_64-linux-gnu/libSDL2-2.0.so.0(+0xf31ab)[0x7f858862a1ab]
lib/lv_bindings/tests/../../../ports/unix/micropython-dev(+0x36fac3)[0x55b154b71ac3]
/lib/x86_64-linux-gnu/libSDL2-2.0.so.0(+0x75720)[0x7f85885ac720]
/lib/x86_64-linux-gnu/libSDL2-2.0.so.0(+0xee13d)[0x7f858862513d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7f858832b609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f8588467293]

Guess for the root cause:

lv_timer.py uses timer_create to signal the event loop. However, the signal might be sent to any thread. If it is not sent to a Micropython thread, gc_lock crashes because it calls MP_STATE_THREAD which calls pthread_getspecific under the assumption that this is a Micropython thread and mp_state_thread_t is valid.

Possible workaround

Use SIGEV_THREAD_ID (Linux-specific) when registering the signal in lv_timer.py to ensure the Micropython thread is called.

embeddedt commented 3 years ago

Is it possible to mask the timer signal in main, and then opt-in a specific MicroPython thread to handle the signal after it begins running (e.g. like the example on this page)? Just an idea. :man_shrugging:

amirgon commented 3 years ago

Is it possible to mask the timer signal in main, and then opt-in a specific MicroPython thread to handle the signal after it begins running (e.g. like the example on this page)? Just an idea. 🤷‍♂️

The main thread is not waiting for the signal at any point. The signal interrupts the thread asynchronically (where we can either handle it directly for simple cases or schedule some function, very much like a real interrupt handler)

In Micropython, threading is optional and experimental so I'm not sure about creating a dedicated Micropython thread for handling the event loop.

Even if we did that and created a dedicated event loop Micropython thread, how do we mask all threads but this specific thread from receiving the signal? We can affect future threads created by the process but not past threads, and we can't guarantee that no thread was created before the event loop was initialized.

What do you think?

embeddedt commented 3 years ago

how do we mask all threads but this specific thread from receiving the signal? We can affect future threads created by the process but not past threads, and we can't guarantee that no thread was created before the event loop was initialized.

According to that example, this can be done by masking the signal before any threads are created, as threads inherit the parent thread's signal mask. That means we would most likely need to add a patch to main.

In Micropython, threading is optional and experimental so I'm not sure about creating a dedicated Micropython thread for handling the event loop.

I'm not sure we need to actually create a new thread. Isn't opting in the main thread later, after any other non-MicroPython threads have already been spun off, enough?

amirgon commented 3 years ago

That means we would most likely need to add a patch to main.

That's possible, but it would also pin the signal number. Currently lv_timer.py does not require special patch to Micropython and leaves the signal number configurable by the user.

Isn't opting in the main thread later, after any other non-MicroPython threads have already been spun off, enough?

Yes if we masked the signal in main then unmasking the main thread later should work.

amirgon commented 2 years ago

The problem was related to two independent event loops that were running simultaneously, one was the internal SDL driver event loop running on an SDL thread and the other was the generic one form lv_utils.py running on Micropython thread.

After fixing that (and a few other things) the problem seems to be gone.

The problem described above (signal triggered on another thread) is only a problem if the other thread is not a Micropython thread (for example, an SDL thread).

But as long as all threads are Micropython threads, it's legal to raise the event loop signal on either of them.
So currently the assumption is that all threads in the Micropython process are Micropython threads. If this assumption is wrong, the problem described above would need to be addressed.

Related commits: c8d9dd5d5d3fe4a6241d922463b3cdaec48f2d5e and the following commits.

Closing for now.

lvgl / lv_binding_micropython

SIGSEGV when running advanced_demo.py on the unix port #164

Guess for the root cause:

Possible workaround