openucx / ucx

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
http://www.openucx.org
Other
1.11k stars 417 forks source link

Does not compile with musl libc and gcc 6.3.0 #1568

Open stormmq opened 7 years ago

stormmq commented 7 years ago

Here are the reasons; some are minor, but it'd be nice if this library worked with musl libc:-

/usr/local/Cellar/musl-cross/0.9.5/libexec/x86_64-linux-musl/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
 #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>
  ^~~~~~~
mmap/replace.c:24:49: error: 'PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP' undeclared here (not in a function)
 static pthread_mutex_t ucm_mmap_get_orig_lock = PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP;
                                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mmap/replace.c:25:54: error: initialization makes pointer from integer without a cast [-Werror=int-conversion]
 static pthread_t volatile ucm_mmap_get_orig_thread = -1;
                                                      ^
util/log.c: In function 'ucm_log_vsnprintf':
util/log.c:149:20: error: assignment makes pointer from integer without a cast [-Werror=int-conversion]
                 ps = strerror_r(eno, pb, endb - pb);
                    ^
mmap/replace.c: In function 'ucm_orig_mmap':
mmap/replace.c:50:38: error: assignment makes pointer from integer without a cast [-Werror=int-conversion]
             ucm_mmap_get_orig_thread = -1; \
                                      ^
mmap/replace.c:97:1: note: in expansion of macro 'UCM_DEFINE_MM_FUNC'
 UCM_DEFINE_MM_FUNC(mmap,   void*, MAP_FAILED, void*, size_t, int, int, int, off_t)
 ^~~~~~~~~~~~~~~~~~
mmap/replace.c: In function 'ucm_orig_munmap':
mmap/replace.c:50:38: error: assignment makes pointer from integer without a cast [-Werror=int-conversion]
             ucm_mmap_get_orig_thread = -1; \
                                      ^
mmap/replace.c:98:1: note: in expansion of macro 'UCM_DEFINE_MM_FUNC'
 UCM_DEFINE_MM_FUNC(munmap, int,   -1,         void*, size_t)
 ^~~~~~~~~~~~~~~~~~
In file included from malloc/malloc_hook.c:26:0:
/Volumes/Source/GitHub/lemonrock/rdma-core/workspace/ucx-sys/compile.conf.d/temporary/src/ucs/type/spinlock.h: In function 'ucs_spinlock_init':
/Volumes/Source/GitHub/lemonrock/rdma-core/workspace/ucx-sys/compile.conf.d/temporary/src/ucs/type/spinlock.h:34:17: error: assignment makes pointer from integer without a cast [-Werror=int-conversion]
     lock->owner = 0xfffffffful;
                 ^
/Volumes/Source/GitHub/lemonrock/rdma-core/workspace/ucx-sys/compile.conf.d/temporary/src/ucs/type/spinlock.h: In function 'ucs_spin_unlock':
/Volumes/Source/GitHub/lemonrock/rdma-core/workspace/ucx-sys/compile.conf.d/temporary/src/ucs/type/spinlock.h:91:21: error: assignment makes pointer from integer without a cast [-Werror=int-conversion]
         lock->owner = 0xfffffffful;
                     ^
mmap/replace.c: In function 'ucm_orig_mremap':
mmap/replace.c:50:38: error: assignment makes pointer from integer without a cast [-Werror=int-conversion]
             ucm_mmap_get_orig_thread = -1; \
                                      ^
mmap/replace.c:99:1: note: in expansion of macro 'UCM_DEFINE_MM_FUNC'
 UCM_DEFINE_MM_FUNC(mremap, void*, MAP_FAILED, void*, size_t, size_t, int)
 ^~~~~~~~~~~~~~~~~~
mmap/replace.c: In function 'ucm_orig_shmat':
mmap/replace.c:50:38: error: assignment makes pointer from integer without a cast [-Werror=int-conversion]
             ucm_mmap_get_orig_thread = -1; \
                                      ^
mmap/replace.c:100:1: note: in expansion of macro 'UCM_DEFINE_MM_FUNC'
 UCM_DEFINE_MM_FUNC(shmat,  void*, MAP_FAILED, int, const void*, int)
 ^~~~~~~~~~~~~~~~~~
mmap/replace.c: In function 'ucm_orig_shmdt':
mmap/replace.c:50:38: error: assignment makes pointer from integer without a cast [-Werror=int-conversion]
             ucm_mmap_get_orig_thread = -1; \
                                      ^
mmap/replace.c:101:1: note: in expansion of macro 'UCM_DEFINE_MM_FUNC'
 UCM_DEFINE_MM_FUNC(shmdt,  int,   -1,         const void*)
 ^~~~~~~~~~~~~~~~~~
mmap/replace.c: In function 'ucm_orig_sbrk':
mmap/replace.c:50:38: error: assignment makes pointer from integer without a cast [-Werror=int-conversion]
             ucm_mmap_get_orig_thread = -1; \
                                      ^
mmap/replace.c:102:1: note: in expansion of macro 'UCM_DEFINE_MM_FUNC'
 UCM_DEFINE_MM_FUNC(sbrk,   void*, MAP_FAILED, intptr_t)
 ^~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
In file included from /Volumes/Source/GitHub/lemonrock/rdma-core/workspace/ucx-sys/compile.conf.d/temporary/src/ucs/sys/sys.h:25:0,
                 from malloc/malloc_hook.c:30:
/usr/local/Cellar/musl-cross/0.9.5/libexec/x86_64-linux-musl/include/sys/fcntl.h: At top level:
/usr/local/Cellar/musl-cross/0.9.5/libexec/x86_64-linux-musl/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
 #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>
  ^~~~~~~
make[2]: *** [mmap/libucm_la-replace.lo] Error 1
make[2]: *** Waiting for unfinished jobs....
malloc/malloc_hook.c: In function 'ucm_malloc_install':
malloc/malloc_hook.c:622:9: error: implicit declaration of function 'malloc_trim' [-Werror=implicit-function-declaration]
         malloc_trim(0);
         ^~~~~~~~~~~
cc1: all warnings being treated as errors
cc1: all warnings being treated as errors
make[2]: *** [util/libucm_la-reloc.lo] Error 1
make[2]: *** [util/libucm_la-log.lo] Error 1
cc1: all warnings being treated as errors
make[2]: *** [malloc/libucm_la-malloc_hook.lo] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

The most serious looks to be the use of malloc_trim - I'm not sure that non-standard function is supported by musl libc.

shamisp commented 7 years ago

Thanks for the report @yosefe - do you want to take this one ? It also can go as a bug fix for v1.2

yosefe commented 7 years ago

@shamisp i don't think it's critical for v1.2

shamisp commented 7 years ago

My point - if this is some simple fix it can go to v1.2

raphaelcohn commented 7 years ago

If it's helpful, I've started working on some local changes to get UCX to compile against a statically-linked musl (I'm intending to port UCX into Libertine Linux, a secure, in-memory and small Linux distro for server clusters which is managed entirely through source control; it's still a work in progress).

I've managed to get a little more insight into the causes of failures that are musl-specific, as opposed to GCC 6+ specific.

1 Non-POSIX header locations 1.1 src/signal.h should be signal.h

src/tools/profile/read_profile.c:#include <sys/signal.h>
src/ucs/config/global_opts.c:#include <sys/signal.h>

1.2 src/fcntl.h should be fcntl.h

src/tools/profile/read_profile.c:#include <sys/fcntl.h>
src/ucm/util/reloc.c:#include <sys/fcntl.h>
src/ucs/sys/sys.h:#include <sys/fcntl.h>

2 glibc-only 2.1 malloc_trim

src/ucm/malloc/malloc_hook.c:        malloc_trim(0);
test/gtest/ucm/malloc_hook.cc:        malloc_trim(0);
test/gtest/ucm/malloc_hook.cc:    malloc_trim(0);
test/gtest/ucm/malloc_hook.cc:    malloc_trim(0);
test/mpi/test_memhooks.c:    malloc_trim(0);

This function can be guarded using a #ifdef __GLIBC construction. Interestingly, src/ucm/malloc/malloc_hook.c gains a very slight performance gain as an if clause is no longer needed...

2.2 Non-portable pthread macros

./src/ucm/mmap/replace.c:static pthread_mutex_t ucm_mmap_get_orig_lock = PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP;

In musl, this can be replaced with {{PTHREAD_MUTEX_RECURSIVE}}. This may also be the case for glibc, but it won't make the code portable to non-Linux libc libraries, Android or uclibc (but that's nearly dead).

2.3 Non-portable use of signal stuff

async/signal.c: In function 'ucs_async_signal_sys_timer_create':
async/signal.c:113:32: error: 'SIGEV_THREAD_ID' undeclared (first use in this function)
     ev.sigev_notify          = SIGEV_THREAD_ID;
                                ^~~~~~~~~~~~~~~
async/signal.c:113:32: note: each undeclared identifier is reported only once for each function it appears in
async/signal.c:116:7: error: 'struct sigevent' has no member named '_sigev_un'; did you mean 'sigev_value'?
     ev._sigev_un._tid        = tid; /* target thread */

I'm not sure what the correct answer is to this at the moment. Mucking with signals when threads are involved usually ends in tears...

raphaelcohn commented 7 years ago

Actually, 2.2 is possibly wrong - see https://issues.asterisk.org/jira/browse/ASTERISK-24154 .

raphaelcohn commented 7 years ago

I've used the following in async/signal.c an attempt to compile-past 2.3 above:-

#include <time.h>
#include <unistd.h>
#include <sys/syscall.h>
struct ksigevent
{
    union sigval sigev_value;
    int sigev_signo;
    int sigev_notify;
    int sigev_tid;
};

int _timer_create(clockid_t clk, struct ksigevent *restrict evp, timer_t *restrict res)
{
    int timerid;
    if (syscall(SYS_timer_create, clk, &evp, &timerid) < 0)
    {
        timerid = -1;
    }
    if (timerid < 0)
    {
        return -1;
    }
    *res = (void *)(intptr_t)timerid;
}

static ucs_status_t
ucs_async_signal_sys_timer_create(int uid, pid_t tid, timer_t *sys_timer_id)
{
    struct ksigevent ev;
    timer_t timer;
    int ret;

    ucs_trace_func("tid=%d", tid);

    /* Create timer signal */
    memset(&ev, 0, sizeof(ev));

    ev.sigev_notify          = 4; /* SIGEV_THREAD_ID */
    ev.sigev_signo           = ucs_global_opts.async_signo;
    ev.sigev_value.sival_int = uid; /* user parameter to timer */
    ev.sigev_tid             = tid; /* target thread */
    ret = _timer_create(CLOCK_REALTIME, &ev, &timer);
    if (ret < 0) {
        ucs_error("failed to create an interval timer: %m");
        return UCS_ERR_INVALID_PARAM;
    }

    *sys_timer_id = timer;
    return UCS_OK;
}

(I'm not 100% about the line *res = (void *)(intptr_t)timerid). This leads to another failure - see next comment.

raphaelcohn commented 7 years ago

debug/debug.c assumes the presence of the header execinfo.h. This doesn't exist on musl - it's a weird glibc-ism - as this header is often part of a separate libunwind package.

Sorting this out results in a missing problem with cpu_set_t...

raphaelcohn commented 7 years ago

Because the build does not pass -D_GNU_SOURCE. Obviously, this namespace specifier is always set on glibc... but not other libc builds.

raphaelcohn commented 7 years ago

And with that, I can get the build to compile and statically link against musl. Of course, there are a very large list of warnings left, but I'll put that in another issue.

raphaelcohn commented 7 years ago

For the record, the changes to debug.c involve making static backtrace_h ucs_debug_backtrace_create(void) return NULL, void ucs_debug_print_backtrace(FILE *stream, int strip) do nothing and removing #include <execinfo.h>.

raphaelcohn commented 7 years ago

I could submit a PR with my changes but they're really hacky - they're more to flush out all the reasons compilation fails.

shamisp commented 7 years ago

@raphaelcohn you can submit WIP PR and continue polishing this. As for now, in order to merge a PR we need signed CLA http://www.openucx.org/wp-content/uploads/2016/01/ucx-individual-contributor-agrement-v1.3.pdf