openucx / ucx

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
http://www.openucx.org
Other
1.12k stars 421 forks source link

[Q] mimimizing pthread_spinlock dependency. #3928

Closed hiroyuki-sato closed 5 years ago

hiroyuki-sato commented 5 years ago

Hello, developers.

What do you think to use ucs_spinlock instead of ptread_spinlock_t in the following files? I want to minimize ptread_spinlock dependency because macOS doesn't have pthread_spinlock. If this idea is good, I'll create PR later.

Best regards.

src/ucm/malloc/malloc_hook.c
src/ucm/event/event.c
src/ucs/memory/rcache_int.h
src/ucs/time/timerq.h
src/ucs/debug/log.c
src/ucs/debug/debug.c
dmitrygx commented 5 years ago

👍 @hiroyuki-sato how ucs_spinlock_t will be implemented on macOS? I think implementing it over non-reentrant pthread_mutext_t (i.e. simple mutex w/o PTHREAD_MUTEX_RECURSIVE) is applicable here, wdyt?

hiroyuki-sato commented 5 years ago

Hello, @dmitrygx . Thank you for your comment. I'll create a PR later.

how ucs_spinlock_t will be implemented on macOS?

I and @keisukefukuda think two candidates.

If it is easier and usable besides two choices, I want to use it.

BTW, FreeBSD seems support pthread_spinlock. I need to think macOS only.

dmitrygx commented 5 years ago
  • Use pthread_mutex_lock instead of pthread_spinlock. We are worrying about reentrant.

pthread_mutex_t should be non-reentrant by default

os_unfair_lock is a replacement for OSSpinLock - I think it should be non-reentrant.

BTW, FreeBSD seems support pthread_spinlock. I need to think macOS only.

We could add macOS-specific implementation under #ifdef __APPLE__:

#ifdef __APPLE__
...
#else /* other platfroms - Linux and FreeBSD */
...
#endif

Also, we can consider something like (not tested):

int pthread_spin_init(pthread_spinlock_t *lock, int pshared) {
    __asm__ __volatile__ ("" ::: "memory");
    *lock = 0;
    return 0;
}

int pthread_spin_destroy(pthread_spinlock_t *lock) {
    return 0;
}

int pthread_spin_lock(pthread_spinlock_t *lock) {
    while (1) {
        if (__sync_bool_compare_and_swap(lock, 0, 1)) {
            return 0;
        }
        sched_yield();
    }
}

int pthread_spin_unlock(pthread_spinlock_t *lock) {
    __asm__ __volatile__ ("" ::: "memory");
    *lock = 0;
    return 0;
}
hiroyuki-sato commented 5 years ago

Hello, @dmitrygx. Thank you for your help.

The following code succeed compile in my macOS. (I changed tab to space and add typedef)

I'll use this code as pthread_spinlock on macOS.

#include <pthread.h>

typedef int pthread_spinlock_t;

int pthread_spin_init(pthread_spinlock_t *lock, int pshared) {
    __asm__ __volatile__ ("" ::: "memory");
    *lock = 0;
    return 0;
}

int pthread_spin_destroy(pthread_spinlock_t *lock) {
    return 0;
}

int pthread_spin_lock(pthread_spinlock_t *lock) {
    while (1) {
        if (__sync_bool_compare_and_swap(lock, 0, 1)) {
            return 0;
        }   
    sched_yield();
    }
}

int pthread_spin_unlock(pthread_spinlock_t *lock) {
    __asm__ __volatile__ ("" ::: "memory");
    *lock = 0;
    return 0;
}

int main(){
  pthread_spinlock_t lock;

  pthread_spin_init(&lock, 0);
  pthread_spin_lock(&lock);
  pthread_spin_unlock(&lock);
  pthread_spin_destroy(&lock);
}
dmitrygx commented 5 years ago

btw, pls use ucs_memory_cpu_fence() instead of __asm__ __volatile__ ("" ::: "memory");

hiroyuki-sato commented 5 years ago

Hello, @dmitrygx. Thanks! I'll use this function on macOS specific branch until PR ready.

#include <pthread.h>

typedef int pthread_spinlock_t;

int pthread_spin_init(pthread_spinlock_t *lock, int pshared) {
    ucs_memory_cpu_fence();
    *lock = 0;
    return 0;
}

int pthread_spin_destroy(pthread_spinlock_t *lock) {
    return 0;
}

int pthread_spin_lock(pthread_spinlock_t *lock) {
    while (1) {
        if (__sync_bool_compare_and_swap(lock, 0, 1)) {
            return 0;
        }   
    sched_yield();
    }
}

int pthread_spin_unlock(pthread_spinlock_t *lock) {
    ucs_memory_cpu_fence();
    *lock = 0;
    return 0;
}

int main(){
  pthread_spinlock_t lock;

  pthread_spin_init(&lock, 0);
  pthread_spin_lock(&lock);
  pthread_spin_unlock(&lock);
  pthread_spin_destroy(&lock);
}
dmitrygx commented 5 years ago

also pls use architecture-independent ucs_atomic_cswap32 instead of __sync_bool_compare_and_swap

try-lock is also needed for ucs_spinlock, here the implementation:

int pthread_spin_trylock(pthread_spinlock_t *lock)
{
    return ucs_atomic_cswap32(lock, 0, 1) ? 0 : EBUSY;
}
int pthread_spin_lock(pthread_spinlock_t *lock)
{
    while (1) {
        if (!pthread_spin_trylock(lock, 0, 1)) {
            return 0;
        }
        sched_yield();
    }
}
dmitrygx commented 5 years ago

btw, I'm still not sure that is an optimal implementation can we use it if no (pthread_spinlock_t & os_unfair_lock & OSSpinLock) are available on a system where we are running on

#ifdef HAVE_SPINLOCK
...
#elif defined(HAVE_SO_UNFAIR)
...
#elif defined(HAVE_OSSPINLOCK)
...
#else
/* naive implementation using atomic compare-swap operation */ 
...
#endif
hiroyuki-sato commented 5 years ago

Hello, @dmitrygx. Thank you for the comment.

can we use it if no (pthread_spinlock_t & os_unfair_lock & OSSpinLock) are available on a system where we are running on

Does this mean, that macOS 10.14(with os_unfair_lock) use os_unfair_lock (#elif defined(HAVE_SO_UNFAIR)) part?

dmitrygx commented 5 years ago

Does this mean, that macOS 10.14(with os_unfair_lock) use os_unfair_lock (#elif defined(HAVE_SO_UNFAIR)) part?

yep, we need to identify whether pthread_spinlock_t/os_unfair_lock/OSSpinLock available or not at configure step - it can be done by means of AC_CHECK_DECLS. If nothing is available, we can use our naive implementation

dmitrygx commented 5 years ago

configure.ac file

something like can work:

AC_CHECK_DECLS(pthread_spinlock_t,
               AC_DEFINE([HAVE_POSIX_SPINLOCK], 1, [POSIX Spinlock], [],
               [[#include <pthread.h>]])

(not tested)

and the same for os_unfair_lock and OSSpinLock

hiroyuki-sato commented 5 years ago

Hello, @dmitrygx. Thank you for your comment.

hiroyuki-sato commented 5 years ago

We already implement pthread_spinlock check function. https://github.com/hiroyuki-sato/ucx/commit/67ae96a96698a70f9337da9a117394eb8b8940ba

So, It is easier to use os_unfair_lock check.

But I'm thinking when I create this PR.

dmitrygx commented 5 years ago
  • I think we don't need to care OSSpinLock. Because It's already depreciated over two years before.

Ok, I agree

  • I'm not sure that whether os_unfair_lock is an optimized solution for spinlock on macOS or not.

os_unfair_lock is a not a spinlock (the Apple Developer doc says "... This function doesn't spin on contention, but instead waits in the kernel to be awoken by an unlock. ...").

hiroyuki-sato commented 5 years ago

Could you double-check?

Is this correct?

dmitrygx commented 5 years ago

Is this correct?

sounds like a good plan Thank you 👍

shamisp commented 5 years ago

This is very nice library that our guys released some time ago: https://github.com/ARM-software/progress64

It has spin lock implementation and bunch of other useful locks and algorithms: https://github.com/ARM-software/progress64/blob/master/src/p64_spinlock.c

hiroyuki-sato commented 5 years ago

Hello, @shamisp. Thank you for your comment.

It seems a nice library. I want to know the possibility of timer usage too. (macOS doesn't support timer_create)

I got the following error. Maybe I can fix it easily.

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ar: illegal option -- D
usage:  ar -d [-TLsv] archive file ...
    ar -m [-TLsv] archive file ...
    ar -m [-abiTLsv] position archive file ...
    ar -p [-TLsv] archive [file ...]
    ar -q [-cTLsv] archive file ...
    ar -r [-cuTLsv] archive file ...
    ar -r [-abciuTLsv] position archive file ...
    ar -t [-TLsv] archive [file ...]
    ar -x [-ouTLsv] archive [file ...]
make: *** [libprogress64.a] Error 1
hiroyuki-sato commented 5 years ago

Hello, @dmitrygx.

What do you think the following change?

src/ucm/event/event.c

https://github.com/openucx/ucx/blob/658bcde0ecef12cf338ce9ab964b0eb1bbde0f28/src/ucm/event/event.c#L605

pthread_spin_init use the second argument like the below.

pthread_spin_init(&ucm_kh_lock, PTHREAD_PROCESS_PRIVATE);

It needs to extend ucs_spin_init, because now it accepts one argument only. For example

ucs_spin_init(&ucm_kh_lock, UCS_SPIN_PROCESS_PRIVATE)

Best regards.

dmitrygx commented 5 years ago

Hello @hiroyuki-sato

What do you think the following change?

IIRC, the default value of a process shared attribute is PTHREAD_PROCESS_PRIVATE. So, I guess we could just use ucs_spin_init(&ucm_kh_lock);

Once some UCX service needs PTHREAD_PROCESS_SHARED, we can extend ucs_spin_init to accept the process shared attribute.

dmitrygx commented 5 years ago

What Oracle Multi Programming Guide says about spinlocks:

The pshared attribute has one of the following values:

PTHREAD_PROCESS_SHARED

Description: Permits a spin lock to be operated on by any thread that has access to the memory where the spin lock is allocated. Operation on the lock is permitted even if the lock is allocated in memory that is shared by multiple processes.

PTHREAD_PROCESS_PRIVATE

Description: Permits a spin lock to be operated upon only by threads created within the same process as the thread that initialized the spin lock. If threads of differing processes attempt to operate on such a spin lock, the behavior is undefined. The default value of the process-shared attribute is PTHREAD_PROCESS_PRIVATE.
hiroyuki-sato commented 5 years ago

Hello, @dmitrygx. Thanks! I got it. I created #3939.

shamisp commented 5 years ago

It seems a nice library. I want to know the possibility of timer usage too. (macOS doesn't support timer_create)

  • Does this library already production level(Does OpenDataPlane use this library?)

I don't know but @WonderfulVoid may help to answer this.

  • Do you have any plan to replace to this library in the future?

I think the library was design to make easy to borrow ideas. Since most of the code inline it is easy to copy-n-past some of the functions.

  • It seems that it requires modification if I use this library on macOS.

I got the following error. Maybe I can fix it easily.

I don't think it was tested on OSX

WonderfulVoid commented 5 years ago

It seems a nice library. I want to know the possibility of timer usage too. (macOS doesn't support timer_create)

  • Does this library already production level(Does OpenDataPlane use this library?)

No, I am pretty sure OpenDataPlane doesn't use this library. It was written after I stopped participating in ODP development.

Production level is a good question. Most data types have tests so I am pretty happy with the quality of the implementation. I was thinking of making a few more (incompatible) API changes but might have to start with some real version management.

I don't know but @WonderfulVoid may help to answer this.

  • Do you have any plan to replace to this library in the future?

I think the library was design to make easy to borrow ideas. Since most of the code inline it is easy to copy-n-past some of the functions.

I use Progress64 in an internal project. The code just includes the relevant p64 header and source files for full inlining into the calling code. But you can of course build the library separately and link against it.

  • It seems that it requires modification if I use this library on macOS.

I got the following error. Maybe I can fix it easily.

I don't think it was tested on OSX

I haven't successfully built on macOS. aligned_alloc() seems to be missing (using both GCC and clang). Do you know how get the linker to find aligned_alloc or any suitable replacement function?

WonderfulVoid commented 5 years ago

I got the following error. Maybe I can fix it easily.

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ar: illegal option -- D
usage:  ar -d [-TLsv] archive file ...
  ar -m [-TLsv] archive file ...
  ar -m [-abiTLsv] position archive file ...
  ar -p [-TLsv] archive [file ...]
  ar -q [-cTLsv] archive file ...
  ar -r [-cuTLsv] archive file ...
  ar -r [-abciuTLsv] position archive file ...
  ar -t [-TLsv] archive [file ...]
  ar -x [-ouTLsv] archive [file ...]
make: *** [libprogress64.a] Error 1

The fix is to remove the 'D' option to ar in Makefile. -ARFLAGS = -rcD +#ARFLAGS = -rcD +#Deterministic option not supported by macOS +ARFLAGS = -rc

But then the linker complains on aligned_alloc missing(). However I don't think this function is required for simple locks like the basic spin lock.

hiroyuki-sato commented 5 years ago

Hello, @WonderfulVoid, @shamisp. Thank you for your comment.

I'll try to build it on macOS again. I'm not familiar about C11 much. How about posix_memalign?

from macOS man page.


POSIX_MEMALIGN(3)        BSD Library Functions Manual        POSIX_MEMALIGN(3)

NAME
     posix_memalign -- aligned memory allocation

SYNOPSIS
     #include <stdlib.h>

     int
     posix_memalign(void **memptr, size_t alignment, size_t size);

DESCRIPTION
     The posix_memalign() function allocates size bytes of memory such that
     the allocation's base address is an exact multiple of alignment, and
     returns the allocation in the value pointed to by memptr.

     The requested alignment must be a power of 2 at least as large as
     sizeof(void *).
WonderfulVoid commented 5 years ago

Hello, @WonderfulVoid, @shamisp. Thank you for your comment.

I'll try to build it on macOS again. I'm not familiar about C11 much. How about posix_memalign?

posix_memalign should work I assume. But aligned_alloc() is part of C11. The Makefile specifies -std=c99 but actually we do need C11 (e.g. the atomics support). Specifying -std=c11 (supported by both GCC and clang) doesn't make any difference though.

On macOS Mojave, the file /Library/Developer/CommandLineTools/usr/include/c++/v1/stdlib.h includes this line: void *aligned_alloc(size_t alignment, size_t size); // C11 But which library includes this symbol?

It looks like the macOS tool chain (Xcode) is not C11 compliant.

I will make the necessary changes to use posix_memalign() and upload a new version.

hiroyuki-sato commented 5 years ago

Hello, @WonderfulVoid. Thank you for your comment.

It looks like the macOS tool chain (Xcode) is not C11 compliant. I think so too.

I found this article. boost... https://stackoverflow.com/questions/52777209/compiler-cant-find-aligned-alloc-function

keisukefukuda commented 5 years ago

@WonderfulVoid Are you working on implementing a fall-back on posix_memalign()? I'm willing to do so if you are not.

WonderfulVoid commented 5 years ago

@WonderfulVoid Are you working on implementing a fall-back on posix_memalign()? I'm willing to do so if you are not.

I will do this. The progress64 project does not yet have a contributor's agreement so it will be faster if I write the code myself (or someone from Arm). Expect something tomorrow.

keisukefukuda commented 5 years ago

Great. 👍

WonderfulVoid commented 5 years ago

posix_memalign

@WonderfulVoid Are you working on implementing a fall-back on posix_memalign()? I'm willing to do so if you are not.

I have Progress64 working on macOS now, with both GCC and Clang. I need to copy files to my local Linux machine and do some internal management before uploading the changes to Github. Too late for that now however, I will be back tomorrow.

hiroyuki-sato commented 5 years ago

@WonderfulVoid Thank you for your working!

WonderfulVoid commented 5 years ago

https://github.com/ARM-software/progress64/commit/3ebdd9119d9cf1b2ef32fc788d8ed0a189b97620 Builds and runs on macOS (10.14.5) with both GCC and Clang, with and without (address and UB) sanitizers (found and fixed an unrelated bug!).

hiroyuki-sato commented 5 years ago

Hello, @WonderfulVoid. Thanks! I succeed Progress64 on macOS.

@shamisp, @dmitrygx Do you have any concern OpenUCX use PROGRESS64 library? If it's OK, I'll try to use this library.

I'm thinking to use spinlock and timer in PROGRESS64 on macOS. Additionally, I'm looking for futex alternative.

I updated the previous plan.

WonderfulVoid commented 5 years ago

I'm thinking to use spinlock and timer in PROGRESS64 on macOS.

Note that the P64 timer support requires some thread to periodically call p64_timer_expire().

Additionally, I'm looking for futex alternative.

Futex is an OS service, something user space code can utilise. Linux is not going to provide an alternative service. What are you really after here? You want something like futex but on macOS?

hiroyuki-sato commented 5 years ago

Hello, @WonderfulVoid Thank you for your comment.

I'm thinking of using p64_timer instead of timer_create. I'll take a look code later.

I'm not familiar futex yet. I created issue #3959. Maybe, It relates to PROGRESS64.

shamisp commented 5 years ago

@hiroyuki-sato I don't have any concerns, it is BSD code so we have no license conflicts. You would have to deal with some of the C11 dependencies, but it should be fixable.

shamisp commented 5 years ago

@yosefe It seems like glibc lock compiled with LSE disabled (disabled atomics), even on fedora 30. I think it actually makes sense to bring over progress64 spin lock (or the one from DPDK)

shamisp commented 5 years ago

@yosefe I think It makes sense to enable c11 features (atomics, barriers) on the platforms that support c11.

hiroyuki-sato commented 5 years ago

@shamisp Thanks! I'll try to use PROGRESS64 and compare-swap spinlock on macOS. (I replied to him this message on twitter)

yosefe commented 5 years ago

@hiroyuki-sato can we close this, since your ucs_spinlock PR was merged?

hiroyuki-sato commented 5 years ago

@yosefe, Thanks. PR #3931, #3939 was merged.