Describe the bug
The hello world thallium RPC example doesn't work in a heterogeneous environment (mac + linux). See hello-world. I modified the source to use 'sockets' provider instead of TCP. I am posting this here because the error messages come from mercury and maybe libfabric?
Run the server on mac:
~/hello-thallium $ ./server
Server running at address ofi+sockets://10.50.58.248:39517
# [80739.928023] mercury->addr: [error] /var/folders/9p/hrppv1m97xs53_jcddyysq4x6t32tv/T/jaswant.panchumarti/spack-stage/spack-stage-mercury-master-nocsov6z3xrlrmlvlisfupujclikc2hu/spack-src/src/na/na_ofi.c:2431
# na_ofi_addr_map_insert(): fi_av_insert() failed, inserted: 0
# [80739.928109] mercury->addr: [error] /var/folders/9p/hrppv1m97xs53_jcddyysq4x6t32tv/T/jaswant.panchumarti/spack-stage/spack-stage-mercury-master-nocsov6z3xrlrmlvlisfupujclikc2hu/spack-src/src/na/na_ofi.c:2320
# na_ofi_addr_key_lookup(): Could not insert new address
# [80739.928120] mercury->addr: [error] /var/folders/9p/hrppv1m97xs53_jcddyysq4x6t32tv/T/jaswant.panchumarti/spack-stage/spack-stage-mercury-master-nocsov6z3xrlrmlvlisfupujclikc2hu/spack-src/src/na/na_ofi.c:4756
# na_ofi_cq_process_recv_unexpected_event(): Could not lookup address
# [80739.928128] mercury->msg: [error] /var/folders/9p/hrppv1m97xs53_jcddyysq4x6t32tv/T/jaswant.panchumarti/spack-stage/spack-stage-mercury-master-nocsov6z3xrlrmlvlisfupujclikc2hu/spack-src/src/na/na_ofi.c:4680
# na_ofi_cq_process_event(): Could not process unexpected recv event
# [80739.928156] mercury->hg: [error] /var/folders/9p/hrppv1m97xs53_jcddyysq4x6t32tv/T/jaswant.panchumarti/spack-stage/spack-stage-mercury-master-nocsov6z3xrlrmlvlisfupujclikc2hu/spack-src/src/mercury_core.c:3917
# hg_core_progress_na(): Could not make progress on NA (NA_PROTOCOL_ERROR)
# [80739.928167] mercury->hg: [error] /var/folders/9p/hrppv1m97xs53_jcddyysq4x6t32tv/T/jaswant.panchumarti/spack-stage/spack-stage-mercury-master-nocsov6z3xrlrmlvlisfupujclikc2hu/spack-src/src/mercury_core.c:3809
# hg_core_poll_wait(): hg_core_progress_na() failed
# [80739.928173] mercury->hg: [error] /var/folders/9p/hrppv1m97xs53_jcddyysq4x6t32tv/T/jaswant.panchumarti/spack-stage/spack-stage-mercury-master-nocsov6z3xrlrmlvlisfupujclikc2hu/spack-src/src/mercury_core.c:3708
# hg_core_progress(): Could not make blocking progress on context
# [80739.928180] mercury->hg: [error] /var/folders/9p/hrppv1m97xs53_jcddyysq4x6t32tv/T/jaswant.panchumarti/spack-stage/spack-stage-mercury-master-nocsov6z3xrlrmlvlisfupujclikc2hu/spack-src/src/mercury_core.c:5077
# HG_Core_progress(): Could not make progress
# [80739.928208] mercury->hg: [error] /var/folders/9p/hrppv1m97xs53_jcddyysq4x6t32tv/T/jaswant.panchumarti/spack-stage/spack-stage-mercury-master-nocsov6z3xrlrmlvlisfupujclikc2hu/spack-src/src/mercury.c:2074
# HG_Progress(): Could not make progress on context (HG_PROTOCOL_ERROR)
[critical] unexpected return code (12: HG_PROTOCOL_ERROR) from HG_Progress()
Assertion failed: (0), function __margo_hg_progress_fn, file margo-core.c, line 1659.
zsh: abort ./server
and client on Linux:
$ ./client ofi+sockets://10.50.58.248:39517
I get the same output for a client on mac and a server on linux.
To Reproduce
Steps to reproduce the behavior:
On macOS, spack installs argobots@1.1 which simply crashes the server (segmentation fault), so use argobots@main on both Linux and mac with this command.
Describe the bug The hello world thallium RPC example doesn't work in a heterogeneous environment (mac + linux). See hello-world. I modified the source to use 'sockets' provider instead of TCP. I am posting this here because the error messages come from mercury and maybe libfabric?
Run the server on mac:
and client on Linux:
I get the same output for a client on mac and a server on linux.
To Reproduce Steps to reproduce the behavior: On macOS, spack installs argobots@1.1 which simply crashes the server (segmentation fault), so use argobots@main on both Linux and mac with this command.
Compile
namespace tl = thallium;
void hello(const tl::request& req) { std::cout << "Hello World!" << std::endl; }
int main(int argc, char** argv) { HG_Set_log_level("debug"); tl::engine myEngine("sockets", THALLIUM_SERVER_MODE); myEngine.define("hello", hello).disable_response(); std::cout << "Server running at address " << myEngine.self() << std::endl;
}
Platforms: MacOS: Monterey 12.5.1 on M1 with clang-13.1.6 Linux: Ubuntu 22.04 with GCC 11.2.0
Here's output of spack spec mochi-thallium on each platform.