Open cheungsuifai opened 2 months ago
Hi, the assertion failed because either the server or the client does not enable RDMA support. To enable that, you should export the environment variable GRPC_PLATFORM_TYPE= RDMA_BP
on both the client and server side.
Let me know if you still cannot run the test successfully.
I've built grpc-rdma as the below procedure: `
get repo
git clone --recurse-submodules --depth 1 --shallow-submodules \ https://github.com/pwrliang/grpc-rdma.git
make and install abseil individually
mkdir /home/derek/abseil-cpp
cmake -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_POSITION_INDEPENDENT_CODE=TRUE \ -DBUILD_SHARED_LIBS=ON \ -DCMAKE_INSTALL_PREFIX=/home/derek/abseil-cpp \ ../..
make -j4 make install
make and install grpc
cmake ../.. \ -DgRPC_INSTALL=ON \ -DgRPC_BUILD_TESTS=OFF \ -DCMAKE_INSTALL_PREFIX=/home/derek/grpc-rdma/ \ -DgRPC_ABSL_PROVIDER=package \ -Dabsl_DIR=/home/derek/abseil-cpp/lib64/cmake/absl \ -DBUILD_SHARED_LIBS=ON
make -j4 make install `
After that I tried to build the test, and it seems successful: ` cd /grpc-rdma/examples/cpp/test mkdir build cd build cmake -Dabsl_DIR=/home/derek/abseil-cpp/lib64/cmake/absl .. make -j4
ll total 5464 -rw-r--r-- 1 root root 15301 Sep 9 15:14 CMakeCache.txt drwxr-xr-x 10 root root 4096 Sep 9 15:15 CMakeFiles -rw-r--r-- 1 root root 1664 Sep 9 15:14 cmake_install.cmake -rwxr-xr-x 1 root root 703312 Sep 9 15:15 greeter_async_client -rwxr-xr-x 1 root root 722296 Sep 9 15:15 greeter_async_client2 -rwxr-xr-x 1 root root 872344 Sep 9 15:15 greeter_async_server -rwxr-xr-x 1 root root 693584 Sep 9 15:15 greeter_client -rwxr-xr-x 1 root root 692624 Sep 9 15:15 greeter_server -rw-r--r-- 1 root root 19859 Sep 9 15:15 helloworld.grpc.pb.cc -rw-r--r-- 1 root root 79054 Sep 9 15:15 helloworld.grpc.pb.h -rw-r--r-- 1 root root 116744 Sep 9 15:15 helloworld.pb.cc -rw-r--r-- 1 root root 111977 Sep 9 15:15 helloworld.pb.h -rw-r--r-- 1 root root 1520994 Sep 9 15:15 libhw_grpc_proto.a -rw-r--r-- 1 root root 14095 Sep 9 15:14 Makefile `
But I can not start the server in the test: ` export GRPC_RDMA_DEVICE_NAME=mlx5_1 export GRPC_VERBOSITY="DEBUG" export GRPC_TRACE="rdma_sr_bp,rdma_sr_bp_debug" export GRPC_VERBOSITY="DEBUG" export GRPC_PLATFORM_TYPE="RDMA_BP"
./build/greeter_server 50051 I0909 16:38:53.464019826 103488 iomgr_internal.cc:46] Select RDMA Busy Polling mode D0909 16:38:53.464226542 103488 ev_posix.cc:185] Using polling engine: epollex_rdma_bp D0909 16:38:53.464594375 103488 lb_policy_registry.cc:42] registering LB policy factory for "grpclb" D0909 16:38:53.464620599 103488 lb_policy_registry.cc:42] registering LB policy factory for "priority_experimental" D0909 16:38:53.464627598 103488 lb_policy_registry.cc:42] registering LB policy factory for "weighted_target_experimental" D0909 16:38:53.464632799 103488 lb_policy_registry.cc:42] registering LB policy factory for "pick_first" D0909 16:38:53.464637549 103488 lb_policy_registry.cc:42] registering LB policy factory for "round_robin" D0909 16:38:53.464642326 103488 dns_resolver_ares.cc:499] Using ares dns resolver D0909 16:38:53.464916670 103488 certificate_provider_registry.cc:33] registering certificate provider factory for "file_watcher" D0909 16:38:53.464939693 103488 lb_policy_registry.cc:42] registering LB policy factory for "cds_experimental" D0909 16:38:53.464946646 103488 lb_policy_registry.cc:42] registering LB policy factory for "xds_cluster_impl_experimental" D0909 16:38:53.464952654 103488 lb_policy_registry.cc:42] registering LB policy factory for "xds_cluster_resolver_experimental" D0909 16:38:53.464958286 103488 lb_policy_registry.cc:42] registering LB policy factory for "xds_cluster_manager_experimental" E0909 16:38:53.464978332 103488 trace.cc:65] Unknown trace var: 'rdma_sr_bp' E0909 16:38:53.464982544 103488 trace.cc:65] Unknown trace var: 'rdma_sr_bp_debug' I0909 16:38:53.465230078 103488 server_builder.cc:333] Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000 I0909 16:38:53.465822792 103488 socket_utils_common_posix.cc:353] TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter Server listening on 0.0.0.0:50051 I0909 16:38:54.098025912 103493 tcp_server_posix.cc:249] accept connection from ipv4:172.23.13.43:57853, fd = 8, client count = 1 I0909 16:38:54.197431080 103493 rdma_bp_posix.cc:763] Take a Pair 0x7efcfc0018b0, peer ipv4:172.23.13.43:57853 I0909 16:38:54.207020445 103493 rdma_bpposix.cc:775] Exchanging data finished, fd 8 I0909 16:38:54.207041215 103493 pair.cc:145] Connecting Pair 0x7efcfc0018b0 E0909 16:38:54.207048189 103493 pair.cc:148] assertion failed: peer.addr.tag == self.addr_.tag `
Thank you in advanced.