add PI_FUTEX-like semantics to sys_mutex()

andrewboie commented 5 years ago

We have the infrastructure, but not the implementation to use atomic operations to avoid system calls for uncontended sys_mutexes. We could store the mutex owner (struct k_thread *) in the atomic_t being tracked by the kernel, along with low-order bit flags for mutexes that currently have waiters.

andrewboie commented 5 years ago

Related to, but doesn't really intersect with #14493

andrewboie commented 5 years ago

@wentongwu this is similar to what you are doing for sys_sem if you want to take a look.

wentongwu commented 5 years ago

yes, I will do this task in the period of v2.1.0 release.

andrewboie commented 5 years ago

@wentongwu are you still planning on doing this for 2.1? @ceolin is free and looking for stuff to do

wentongwu commented 5 years ago

@ceolin @andrewboie this task is on my 2.1 plan and I have some basic code for FUTEX_LOCK_PI and FUTEX_UNLOCK_PI for single mutex per thread, I will do this after syst which will be done soon. Supporting priority inheritance when a thread holds two or more mutexes simultaneously is for the next release. @ceolin would you like take #17021 ?

andrewboie commented 5 years ago

Supporting priority inheritance when a thread holds two or more mutexes simultaneously is for the next release. @ceolin would you like take #17021 ?

Wait, what are you doing? We already have priority inheriting mutexes, any contended sys_mutex is backed by a k_mutex already. All I want is to use atomic operations so that a syscall isn't necessary for an uncontended sys_mutex. What am I missing here?

wentongwu commented 5 years ago

@andrewboie I thought sys_mutex should be backed by pi_futex in kernel side, ok, I understand your meaning. And for the priority inheriting mutex, currently we just support a thread lock only a single mutex at a time when multiple mutexes are shared between threads of different priorities.

andrewboie commented 5 years ago

And for the priority inheriting mutex, currently we just support a thread lock only a single mutex at a time when multiple mutexes are shared between threads of different priorities.

I still don't know what you're talking about. k_mutex implements priority inheritance.

wentongwu commented 5 years ago

And for the priority inheriting mutex, currently we just support a thread lock only a single mutex at a time when multiple mutexes are shared between threads of different priorities.

I still don't know what you're talking about. k_mutex implements priority inheritance.

@andrewboie I'm sorry for the confusing, and before the comment "any contended sys_mutex is backed by a k_mutex already. All I want is to use atomic operations so that a syscall isn't necessary for an uncontended sys_mutex", I thought sys_mutex should be re-implemented based on futex as other OSes did, and for sys_mutex, we also should consider priority inheriting(so I mentioned words "priority inheriting" above), so as my previous thought, we should implement FUTEX_LOCK_PI and FUTEX_UNLOCK_PI for current Zephyr futex.

And below words are from https://docs.zephyrproject.org/latest/reference/kernel/synchronization/mutexes.html and it states the limitation for current k_mutex, also we can find this limitation in code. So for FUTEX_LOCK_PI and FUTEX_UNLOCK_PI of Zephyr futex, as my previous thought I want to add single sys_mutex per thread as current k_futex limitaion first in 2.1 release or don't consider priority inheriting in 2.1 release, because overcoming the limitation for sys_mutex and k_mutex may take some time. And above is my previous idea which may confuse you, hope this can help. The kernel does not fully support priority inheritance when a thread holds two or more mutexes simultaneously. This situation can result in the thread’s priority not reverting to its original non-elevated priority when all mutexes have been released. It is recommended that a thread lock only a single mutex at a time when multiple mutexes are shared between threads of different priorities.

And these two days, I take tome time to consider the idea "any contended sys_mutex is backed by a k_mutex already. All I want is to use atomic operations so that a syscall isn't necessary for an uncontended sys_mutex" carefully, I'm not sure if it can be done without breaking the current k_mutex code, because I think we need sync the state of the sys_mutex's atomic and k_futex's lock_count. If I miss something, please correct me. Thanks.

andrewboie commented 5 years ago

The kernel does not fully support priority inheritance when a thread holds two or more mutexes simultaneously. This situation can result in the thread’s priority not reverting to its original non-elevated priority when all mutexes have been released. It is recommended that a thread lock only a single mutex at a time when multiple mutexes are shared between threads of different priorities.

@wentongwu file a bug to fix this in k_mutex. Limitations of k_mutex are not in scope of this ticket.

I'm not sure if it can be done without breaking the current k_mutex code, because I think we need sync the state of the sys_mutex's atomic and k_futex's lock_count. If I miss something, please correct me. Thanks.

You've lost me. Why would you use k_futex? k_futex is backed by a wait_q which is not what we need. k_futex is for standard futex wake/wait semantics. You will still be using a sys_mutex type. sys_mutex is already tracked as a kernel object and is backed with a k_mutex. sys_mutex is the kernel-side implementation of PI futexes. We do not lump all these functionalities into a single system call like Linux does.

Right now what we have is that sys_mutex_lock() and sys_mutex_unlock() unconditionally make the system calls z_sys_mutex_kernel_lock() and z_sys_mutex_kernel_unlock(). The changes you will make will be to these functions, and to the associated system calls, so that the system calls are not made unless the lock is contended.

If the futex is unlocked, sys_mutex.val = 0 If the futex is locked, sys_mutex.val = the address of the owning k_thread that locked it. Since all threads are 4-byte aligned you can use the lowest 2 bits for flags. See the futex man page for more information.

andrewboie commented 4 years ago

Spent a little time looking at this, and I think we are going to need some infrastructure before this can be properly implemented.

The semantics of the atomic variable sys_mutex->val are as follows:

If the mutex is uncontended, the value is 0
If the mutex is contended with no waiters, the value is the address of the k_thread that owns it
If the mutex is contended with waiters, the value is the address of the k_thread that owns it plus a special bit set to indicate there are threads pending on this mutex. Any k_thread address is at least 4-byte aligned so we can use BIT(0) or BIT(1) for flags like this.

The problem is that if a thread wants to grab an uncontended lock, it needs to atomically set the value to its own k_thread struct (equivalent to the TID in linux's FUTEX_PI). The only way to do this is with k_current_get(), which is itself a system call. Having to make a system call even for uncontended mutexes defeats one of the main purposes of this feature in the first place.

So in order to do this, we need to make k_current_get() work without making a system call. Somehow, the value of _current (which is either _kernel.current or arch_curr_cpu()->current) needs to be in an area of memory that is readable from user mode.

I thought of a few ways to do this:

1) We reserve some memory in the stack object for thread-local data. Currently we use it for storing errno. We could also put the thread's k_tid_t there. Problem is, the address of this memory area is also something which can't be determined without making a system call (which is why z_errno exists). On x86 I could just map a %fs or %gs segment on context switch for 64-bit and 32-bit respectively, but on other arches this is much harder and requires that a pointer to it be in an area of memory that is readable from user mode (sound familiar? we're back at square one)

2) Set up a special read-only memory page or MPU region that exposes some kernel data to user mode, which would include the TID and a pointer to the thread's local storage area. This has its own complications with respect to SMP (since there would need to be data for each CPU and user mode would have to know which set to read) and memory protection (some arches don't let you configure a region that is read-write to the kernel and read-only to user). Needs more thought, and its own GH issue.

zephyrbot commented 9 months ago

Hi @dcpleung,

This issue, marked as an Enhancement, was opened a while ago and did not get any traction. It was just assigned to you based on the labels. If you don't consider yourself the right person to address this issue, please re-assing it to the right person.

Please take a moment to review if the issue is still relevant to the project. If it is, please provide feedback and direction on how to move forward. If it is not, has already been addressed, is a duplicate, or is no longer relevant, please close it with a short comment explaining the reason.

@andrewboie you are also encouraged to help moving this issue forward by providing additional information and confirming this request/issue is still relevant to you.

Thanks!

zephyrproject-rtos / zephyr

add PI_FUTEX-like semantics to sys_mutex() #15138