Open xkszltl opened 3 years ago
@alex--m FYI
The log says uct_completion_callback_t has no status argument
so there may be mismatch somewhere:
I'm on it. Some details on the problem: Change-ID 8da6a5be2e hanged UCT's API:
After: typedef void (*uct_completion_callback_t)(uct_completion_t *self);
Before: typedef void (*uct_completion_callback_t)(uct_completion_t *self,
ucs_status_t status);
For this reason, I added a check which means the code now looks like this:
static void ucg_builtin_step_am_zcopy_comp_step_check_cb(uct_completion_t *self
#ifdef HAVE_UCT_COMP_CB_STATUS_ARG
, ucs_status_t status)
#else
)
#endif
{
...
Looks like the check went wrong - I'm checking as to why this happened, and I'll fix asap.
I checked the v1.9.0 tag (cd9efd3d80ec3e5df4eadd005b9cc38f6324919e) and it has the "Before" version, so It get uct_completion_callback_t has a status argument
and UCG's master build succeeds. The config.log there shows:
configure:18654: result: uct_completion_callback_t has a status argument
configure:18705: Building with Groups and collective operations support (UCG)
I also tested with UCX master, where there's the "after" version, and the config.log shows:
conftest.c: In function 'main':
conftest.c:40:36: error: too many arguments to function 'func'
func(NULL, UCS_OK);
^~~~
configure:18660: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "ucx"
| #define PACKAGE_TARNAME "ucx"
| #define PACKAGE_VERSION "1.10"
| #define PACKAGE_STRING "ucx 1.10"
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| #define STDC_HEADERS 1
| #define HAVE_SYS_TYPES_H 1
| #define HAVE_SYS_STAT_H 1
| #define HAVE_STDLIB_H 1
| #define HAVE_STRING_H 1
| #define HAVE_MEMORY_H 1
| #define HAVE_STRINGS_H 1
| #define HAVE_INTTYPES_H 1
| #define HAVE_STDINT_H 1
| #define HAVE_UNISTD_H 1
| #define __EXTENSIONS__ 1
| #define _ALL_SOURCE 1
| #define _GNU_SOURCE 1
| #define _POSIX_PTHREAD_SEMANTICS 1
| #define _TANDEM_SOURCE 1
| #define PACKAGE "ucx"
| #define VERSION "1.10"
| #define HAVE_DLFCN_H 1
| #define LT_OBJDIR ".libs/"
| #define STDC_HEADERS 1
| #define restrict __restrict
| #define HAVE_DECL_STRERROR_R 1
| #define HAVE_STRERROR_R 1
| #define STRERROR_R_CHAR_P 1
| #define UCX_CONFIGURE_FLAGS "--enable-gtest --enable-examples --with-valgrind --enable-profiling --enable-frame-pointer --enable-stats --enable-memtrack --enable-fault-injection --enable-debug-data --enable-mt --prefix=/home/alex/workspace/ucx/build --enable-ucg"
| #define UCX_MODULE_SUBDIR "ucx"
| /* end confdefs.h. */
| #include "uct/api/uct.h"
| int
| main ()
| {
| uct_completion_callback_t func = NULL;
| func(NULL, UCS_OK);
| ;
| return 0;
| }
configure:18667: result: uct_completion_callback_t has no status argument
configure:18712: Building with Groups and collective operations support (UCG)
@xkszltl - if I understood correctly, you're using the v1.9.0 tag. Could you please post the line from your config.log referring to uct_completion_callback_t ?
The error was found in a docker build (so it's clean and reproducible). Interestingly I didn't hit the issue when I try the script locally on my dev machine:
configure:17486: checking if ln -s supports --relative
configure:17489: result: yes
configure:17908: checking for dot
configure:17924: found /usr/bin/dot
configure:17935: result: yes
configure:18007: ccache /opt/rh/devtoolset-9/root/usr/bin/gcc -c -O3 -fdebug-prefix-map='/media/Scratch/tmp.QR7Pb7ZPGh'='/usr/local/src' -g -Isrc/ conftest.c >&5
configure:18007: $? = 0
configure:18008: result: uct_completion_callback_t has a status argument
configure:18059: Building with Groups and collective operations support (UCG)
configure:20156: checking for size_t
configure:20156: ccache /opt/rh/devtoolset-9/root/usr/bin/gcc -c -O3 -fdebug-prefix-map='/media/Scratch/tmp.QR7Pb7ZPGh'='/usr/local/src' -g conftest.c >&5
configure:20156: $? = 0
configure:20156: ccache /opt/rh/devtoolset-9/root/usr/bin/gcc -c -O3 -fdebug-prefix-map='/media/Scratch/tmp.QR7Pb7ZPGh'='/usr/local/src' -g conftest.c >&5
conftest.c: In function 'main':
conftest.c:74:21: error: expected expression before ')' token
74 | if (sizeof ((size_t)))
| ^
configure:20156: $? = 1
In the bad docker build I found:
configure:17847: ccache /opt/rh/devtoolset-9/root/usr/bin/gcc -c -O3 -fdebug-prefix-map='/tmp/scratch'='/usr/local/src' -g -Isrc/ conftest.c >&5
In file included from conftest.c:35:
src/uct/api/uct.h:16:10: fatal error: uct/api/version.h: No such file or directory
16 | #include <uct/api/version.h>
| ^~~~~~~~~~~~~~~~~~~
compilation terminated.
I also found a line saying config.status:1614: creating src/uct/api/version.h
so I guess it's generated.
In local build, because I had ucg installed previously, it probably found it through -I/usr/local/include
given by others.
To me this sounds like bad timing due to missing dependency.
Describe the bug
Version: ucx 1.9.0 + ucg master
Setup and versions