Closed hiroyuki-sato closed 5 years ago
👍
@hiroyuki-sato
how ucs_spinlock_t
will be implemented on macOS?
I think implementing it over non-reentrant pthread_mutext_t
(i.e. simple mutex w/o PTHREAD_MUTEX_RECURSIVE
) is applicable here, wdyt?
Hello, @dmitrygx . Thank you for your comment. I'll create a PR later.
how ucs_spinlock_t will be implemented on macOS?
I and @keisukefukuda think two candidates.
pthread_mutex_lock
instead of pthread_spinlock
. We are worrying about reentrant. But Almost user doesn't use UCX hard on macOS.(Maybe). So It was the first choice. (commit)os_unfair_lock
. (https://github.com/hiroyuki-sato/ucx/issues/13). This is the second candidate. document links. Probably this lock has non-reentrant feature. (I hope)If it is easier and usable besides two choices, I want to use it.
BTW, FreeBSD seems support pthread_spinlock
. I need to think macOS only.
- Use
pthread_mutex_lock
instead ofpthread_spinlock
. We are worrying about reentrant.
pthread_mutex_t
should be non-reentrant by default
- Use
os_unfair_lock
. (hiroyuki-sato#13). This is the second candidate. document links. Probably this lock has non-reentrant feature. (I hope)
os_unfair_lock
is a replacement for OSSpinLock
- I think it should be non-reentrant.
BTW, FreeBSD seems support
pthread_spinlock
. I need to think macOS only.
We could add macOS-specific implementation under #ifdef __APPLE__
:
#ifdef __APPLE__
...
#else /* other platfroms - Linux and FreeBSD */
...
#endif
Also, we can consider something like (not tested):
int pthread_spin_init(pthread_spinlock_t *lock, int pshared) {
__asm__ __volatile__ ("" ::: "memory");
*lock = 0;
return 0;
}
int pthread_spin_destroy(pthread_spinlock_t *lock) {
return 0;
}
int pthread_spin_lock(pthread_spinlock_t *lock) {
while (1) {
if (__sync_bool_compare_and_swap(lock, 0, 1)) {
return 0;
}
sched_yield();
}
}
int pthread_spin_unlock(pthread_spinlock_t *lock) {
__asm__ __volatile__ ("" ::: "memory");
*lock = 0;
return 0;
}
Hello, @dmitrygx. Thank you for your help.
The following code succeed compile in my macOS.
(I changed tab
to space and add typedef
)
I'll use this code as pthread_spinlock on macOS.
#include <pthread.h>
typedef int pthread_spinlock_t;
int pthread_spin_init(pthread_spinlock_t *lock, int pshared) {
__asm__ __volatile__ ("" ::: "memory");
*lock = 0;
return 0;
}
int pthread_spin_destroy(pthread_spinlock_t *lock) {
return 0;
}
int pthread_spin_lock(pthread_spinlock_t *lock) {
while (1) {
if (__sync_bool_compare_and_swap(lock, 0, 1)) {
return 0;
}
sched_yield();
}
}
int pthread_spin_unlock(pthread_spinlock_t *lock) {
__asm__ __volatile__ ("" ::: "memory");
*lock = 0;
return 0;
}
int main(){
pthread_spinlock_t lock;
pthread_spin_init(&lock, 0);
pthread_spin_lock(&lock);
pthread_spin_unlock(&lock);
pthread_spin_destroy(&lock);
}
btw, pls use ucs_memory_cpu_fence()
instead of __asm__ __volatile__ ("" ::: "memory");
Hello, @dmitrygx. Thanks! I'll use this function on macOS specific branch until PR ready.
#include <pthread.h>
typedef int pthread_spinlock_t;
int pthread_spin_init(pthread_spinlock_t *lock, int pshared) {
ucs_memory_cpu_fence();
*lock = 0;
return 0;
}
int pthread_spin_destroy(pthread_spinlock_t *lock) {
return 0;
}
int pthread_spin_lock(pthread_spinlock_t *lock) {
while (1) {
if (__sync_bool_compare_and_swap(lock, 0, 1)) {
return 0;
}
sched_yield();
}
}
int pthread_spin_unlock(pthread_spinlock_t *lock) {
ucs_memory_cpu_fence();
*lock = 0;
return 0;
}
int main(){
pthread_spinlock_t lock;
pthread_spin_init(&lock, 0);
pthread_spin_lock(&lock);
pthread_spin_unlock(&lock);
pthread_spin_destroy(&lock);
}
also pls use architecture-independent ucs_atomic_cswap32
instead of __sync_bool_compare_and_swap
try-lock is also needed for ucs_spinlock
, here the implementation:
int pthread_spin_trylock(pthread_spinlock_t *lock)
{
return ucs_atomic_cswap32(lock, 0, 1) ? 0 : EBUSY;
}
int pthread_spin_lock(pthread_spinlock_t *lock)
{
while (1) {
if (!pthread_spin_trylock(lock, 0, 1)) {
return 0;
}
sched_yield();
}
}
btw, I'm still not sure that is an optimal implementation can we use it if no (pthread_spinlock_t & os_unfair_lock & OSSpinLock) are available on a system where we are running on
#ifdef HAVE_SPINLOCK
...
#elif defined(HAVE_SO_UNFAIR)
...
#elif defined(HAVE_OSSPINLOCK)
...
#else
/* naive implementation using atomic compare-swap operation */
...
#endif
Hello, @dmitrygx. Thank you for the comment.
can we use it if no (pthread_spinlock_t & os_unfair_lock & OSSpinLock) are available on a system where we are running on
Does this mean, that macOS 10.14(with os_unfair_lock) use os_unfair_lock (#elif defined(HAVE_SO_UNFAIR)
) part?
Does this mean, that macOS 10.14(with os_unfair_lock) use os_unfair_lock (
#elif defined(HAVE_SO_UNFAIR)
) part?
yep, we need to identify whether pthread_spinlock_t
/os_unfair_lock
/OSSpinLock
available or not at configure step - it can be done by means of AC_CHECK_DECLS
.
If nothing is available, we can use our naive implementation
configure.ac
file
something like can work:
AC_CHECK_DECLS(pthread_spinlock_t,
AC_DEFINE([HAVE_POSIX_SPINLOCK], 1, [POSIX Spinlock], [],
[[#include <pthread.h>]])
(not tested)
and the same for os_unfair_lock
and OSSpinLock
Hello, @dmitrygx. Thank you for your comment.
OSSpinLock
. Because It's already depreciated over two years before.We already implement pthread_spinlock
check function.
https://github.com/hiroyuki-sato/ucx/commit/67ae96a96698a70f9337da9a117394eb8b8940ba
So, It is easier to use os_unfair_lock
check.
But I'm thinking when I create this PR.
- I think we don't need to care
OSSpinLock
. Because It's already depreciated over two years before.
Ok, I agree
- I'm not sure that whether os_unfair_lock is an optimized solution for spinlock on macOS or not.
os_unfair_lock
is a not a spinlock (the Apple Developer doc says "... This function doesn't spin on contention, but instead waits in the kernel to be awoken by an unlock. ...
").
Could you double-check?
os_unfair_lock
isn't a spinlock. compare-swap
lock.compare-swap
is an optimal implementation yet.os_unfair_lock
. if it does not work well, switch to compare-swap
version). Is this correct?
Is this correct?
sounds like a good plan Thank you 👍
This is very nice library that our guys released some time ago: https://github.com/ARM-software/progress64
It has spin lock implementation and bunch of other useful locks and algorithms: https://github.com/ARM-software/progress64/blob/master/src/p64_spinlock.c
Hello, @shamisp. Thank you for your comment.
It seems a nice library. I want to know the possibility of timer usage too.
(macOS doesn't support timer_create
)
I got the following error. Maybe I can fix it easily.
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ar: illegal option -- D
usage: ar -d [-TLsv] archive file ...
ar -m [-TLsv] archive file ...
ar -m [-abiTLsv] position archive file ...
ar -p [-TLsv] archive [file ...]
ar -q [-cTLsv] archive file ...
ar -r [-cuTLsv] archive file ...
ar -r [-abciuTLsv] position archive file ...
ar -t [-TLsv] archive [file ...]
ar -x [-ouTLsv] archive [file ...]
make: *** [libprogress64.a] Error 1
Hello, @dmitrygx.
What do you think the following change?
src/ucm/event/event.c
pthread_spin_init
use the second argument like the below.
pthread_spin_init(&ucm_kh_lock, PTHREAD_PROCESS_PRIVATE);
It needs to extend ucs_spin_init, because now it accepts one argument only. For example
ucs_spin_init(&ucm_kh_lock, UCS_SPIN_PROCESS_PRIVATE)
Best regards.
Hello @hiroyuki-sato
What do you think the following change?
IIRC, the default value of a process shared attribute is PTHREAD_PROCESS_PRIVATE
.
So, I guess we could just use ucs_spin_init(&ucm_kh_lock);
Once some UCX service needs PTHREAD_PROCESS_SHARED
, we can extend ucs_spin_init
to accept the process shared attribute.
What Oracle Multi Programming Guide says about spinlocks:
The pshared attribute has one of the following values:
PTHREAD_PROCESS_SHARED
Description: Permits a spin lock to be operated on by any thread that has access to the memory where the spin lock is allocated. Operation on the lock is permitted even if the lock is allocated in memory that is shared by multiple processes.
PTHREAD_PROCESS_PRIVATE
Description: Permits a spin lock to be operated upon only by threads created within the same process as the thread that initialized the spin lock. If threads of differing processes attempt to operate on such a spin lock, the behavior is undefined. The default value of the process-shared attribute is PTHREAD_PROCESS_PRIVATE.
Hello, @dmitrygx. Thanks! I got it. I created #3939.
It seems a nice library. I want to know the possibility of timer usage too. (macOS doesn't support
timer_create
)
- Does this library already production level(Does OpenDataPlane use this library?)
I don't know but @WonderfulVoid may help to answer this.
- Do you have any plan to replace to this library in the future?
I think the library was design to make easy to borrow ideas. Since most of the code inline it is easy to copy-n-past some of the functions.
- It seems that it requires modification if I use this library on macOS.
I got the following error. Maybe I can fix it easily.
I don't think it was tested on OSX
It seems a nice library. I want to know the possibility of timer usage too. (macOS doesn't support
timer_create
)
- Does this library already production level(Does OpenDataPlane use this library?)
No, I am pretty sure OpenDataPlane doesn't use this library. It was written after I stopped participating in ODP development.
Production level is a good question. Most data types have tests so I am pretty happy with the quality of the implementation. I was thinking of making a few more (incompatible) API changes but might have to start with some real version management.
I don't know but @WonderfulVoid may help to answer this.
- Do you have any plan to replace to this library in the future?
I think the library was design to make easy to borrow ideas. Since most of the code inline it is easy to copy-n-past some of the functions.
I use Progress64 in an internal project. The code just includes the relevant p64 header and source files for full inlining into the calling code. But you can of course build the library separately and link against it.
- It seems that it requires modification if I use this library on macOS.
I got the following error. Maybe I can fix it easily.
I don't think it was tested on OSX
I haven't successfully built on macOS. aligned_alloc() seems to be missing (using both GCC and clang). Do you know how get the linker to find aligned_alloc or any suitable replacement function?
I got the following error. Maybe I can fix it easily.
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ar: illegal option -- D usage: ar -d [-TLsv] archive file ... ar -m [-TLsv] archive file ... ar -m [-abiTLsv] position archive file ... ar -p [-TLsv] archive [file ...] ar -q [-cTLsv] archive file ... ar -r [-cuTLsv] archive file ... ar -r [-abciuTLsv] position archive file ... ar -t [-TLsv] archive [file ...] ar -x [-ouTLsv] archive [file ...] make: *** [libprogress64.a] Error 1
The fix is to remove the 'D' option to ar in Makefile. -ARFLAGS = -rcD +#ARFLAGS = -rcD +#Deterministic option not supported by macOS +ARFLAGS = -rc
But then the linker complains on aligned_alloc missing(). However I don't think this function is required for simple locks like the basic spin lock.
Hello, @WonderfulVoid, @shamisp. Thank you for your comment.
I'll try to build it on macOS again. I'm not familiar about C11 much.
How about posix_memalign
?
from macOS man page.
POSIX_MEMALIGN(3) BSD Library Functions Manual POSIX_MEMALIGN(3)
NAME
posix_memalign -- aligned memory allocation
SYNOPSIS
#include <stdlib.h>
int
posix_memalign(void **memptr, size_t alignment, size_t size);
DESCRIPTION
The posix_memalign() function allocates size bytes of memory such that
the allocation's base address is an exact multiple of alignment, and
returns the allocation in the value pointed to by memptr.
The requested alignment must be a power of 2 at least as large as
sizeof(void *).
Hello, @WonderfulVoid, @shamisp. Thank you for your comment.
I'll try to build it on macOS again. I'm not familiar about C11 much. How about
posix_memalign
?
posix_memalign should work I assume. But aligned_alloc() is part of C11. The Makefile specifies -std=c99 but actually we do need C11 (e.g. the atomics support). Specifying -std=c11 (supported by both GCC and clang) doesn't make any difference though.
On macOS Mojave, the file /Library/Developer/CommandLineTools/usr/include/c++/v1/stdlib.h includes this line: void *aligned_alloc(size_t alignment, size_t size); // C11 But which library includes this symbol?
It looks like the macOS tool chain (Xcode) is not C11 compliant.
I will make the necessary changes to use posix_memalign() and upload a new version.
Hello, @WonderfulVoid. Thank you for your comment.
It looks like the macOS tool chain (Xcode) is not C11 compliant. I think so too.
I found this article. boost... https://stackoverflow.com/questions/52777209/compiler-cant-find-aligned-alloc-function
@WonderfulVoid Are you working on implementing a fall-back on posix_memalign()
? I'm willing to do so if you are not.
@WonderfulVoid Are you working on implementing a fall-back on
posix_memalign()
? I'm willing to do so if you are not.
I will do this. The progress64 project does not yet have a contributor's agreement so it will be faster if I write the code myself (or someone from Arm). Expect something tomorrow.
Great. 👍
posix_memalign
@WonderfulVoid Are you working on implementing a fall-back on
posix_memalign()
? I'm willing to do so if you are not.
I have Progress64 working on macOS now, with both GCC and Clang. I need to copy files to my local Linux machine and do some internal management before uploading the changes to Github. Too late for that now however, I will be back tomorrow.
@WonderfulVoid Thank you for your working!
https://github.com/ARM-software/progress64/commit/3ebdd9119d9cf1b2ef32fc788d8ed0a189b97620 Builds and runs on macOS (10.14.5) with both GCC and Clang, with and without (address and UB) sanitizers (found and fixed an unrelated bug!).
Hello, @WonderfulVoid. Thanks! I succeed Progress64 on macOS.
@shamisp, @dmitrygx Do you have any concern OpenUCX use PROGRESS64 library? If it's OK, I'll try to use this library.
I'm thinking to use spinlock
and timer
in PROGRESS64 on macOS.
Additionally, I'm looking for futex
alternative.
I updated the previous plan.
os_unfair_lock
isn't a spinlock. compare-swap
lock or spinlock
in PROGRESS64. compare-swap
and spinlock
in PROGRESS64 are an optimal implementation yet.os_unfair_lock
. if it does not work well, switch to compare-swap
(--enable-compare-swap
version) or spinlock
() --enable-progress64
)compare-swap
(or vice versa), I'll use it only.I'm thinking to use
spinlock
andtimer
in PROGRESS64 on macOS.
Note that the P64 timer support requires some thread to periodically call p64_timer_expire().
Additionally, I'm looking for
futex
alternative.
Futex is an OS service, something user space code can utilise. Linux is not going to provide an alternative service. What are you really after here? You want something like futex but on macOS?
Hello, @WonderfulVoid Thank you for your comment.
I'm thinking of using p64_timer
instead of timer_create
.
I'll take a look code later.
I'm not familiar futex
yet. I created issue #3959.
Maybe, It relates to PROGRESS64.
@hiroyuki-sato I don't have any concerns, it is BSD code so we have no license conflicts. You would have to deal with some of the C11 dependencies, but it should be fixable.
@yosefe It seems like glibc lock compiled with LSE disabled (disabled atomics), even on fedora 30. I think it actually makes sense to bring over progress64 spin lock (or the one from DPDK)
@yosefe I think It makes sense to enable c11 features (atomics, barriers) on the platforms that support c11.
@shamisp Thanks! I'll try to use PROGRESS64 and compare-swap spinlock on macOS. (I replied to him this message on twitter)
@hiroyuki-sato can we close this, since your ucs_spinlock PR was merged?
@yosefe, Thanks. PR #3931, #3939 was merged.
Hello, developers.
What do you think to use
ucs_spinlock
instead ofptread_spinlock_t
in the following files? I want to minimizeptread_spinlock
dependency because macOS doesn't havepthread_spinlock
. If this idea is good, I'll create PR later.Best regards.