tikv / grpc-rs

The gRPC library for Rust built on C Core library and futures
Apache License 2.0
1.81k stars 253 forks source link

The gcpc c++ code is built with massive parallelism, triggering OOM #576

Closed purew closed 2 years ago

purew commented 2 years ago

Describe the bug

In my CI setup on CircleCi, the project is allowed to use something like 4-8 cpu cores. However, the underlying hardware seems to have 36 cores, which the C++ cmake grpc code somehow detects. The C++ grpc code then builds with 36 workers which leads to the build being killed due to what seems like OOM errors, see log below.

I have tried to play with setting several environment variables before starting the build of my Rust project that depends on grpc-rs such as

env NPROC=4 \
       NUM_JOBS=1 \
       MAKEFLAGS="--jobs 2" \
          cargo build ...

but none of these seem to be picked up by the C++ cmake project.

Expected behavior

Ideally I'd want to be able to explicitly specify the number of build workers allowed due to the way CircleCi works (reporting 36 cores available even though project is really only allowed to use 4 of those 36).

System information

Additional context

The logs from my CircleCi run

  [ 35%] Building CXX object CMakeFiles/grpc.dir/src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_impl.cc.o
  [ 35%] Building CXX object CMakeFiles/grpc.dir/src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc.o
  [ 35%] Building CXX object CMakeFiles/grpc.dir/src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc.o
  [ 35%] Building CXX object CMakeFiles/grpc.dir/src/core/ext/filters/client_channel/lb_policy_registry.cc.o
  [ 35%] Building CXX object CMakeFiles/grpc.dir/src/core/ext/filters/client_channel/local_subchannel_pool.cc.o
  [ 35%] Building CXX object CMakeFiles/grpc.dir/src/core/ext/filters/client_channel/proxy_mapper_registry.cc.o
  [ 35%] Building CXX object CMakeFiles/grpc.dir/src/core/ext/filters/client_channel/resolver/binder/binder_resolver.cc.o
  make[5]: Leaving directory '/root/project/target/debug/build/grpcio-sys-f330d267d84e9097/out/build'
  make[4]: Leaving directory '/root/project/target/debug/build/grpcio-sys-f330d267d84e9097/out/build'
  make[3]: Leaving directory '/root/project/target/debug/build/grpcio-sys-f330d267d84e9097/out/build'
  make[2]: Leaving directory '/root/project/target/debug/build/grpcio-sys-f330d267d84e9097/out/build'

  --- stderr
  CMake Warning at third_party/abseil-cpp/CMakeLists.txt:74 (message):
    A future Abseil release will default ABSL_PROPAGATE_CXX_STD to ON for CMake
    3.8 and up.  We recommend enabling this option to ensure your project still
    builds correctly.

  CMake Warning at cmake/protobuf.cmake:51 (message):
    gRPC_PROTOBUF_PROVIDER is "module" but PROTOBUF_ROOT_DIR is wrong
  Call Stack (most recent call first):
    CMakeLists.txt:312 (include)

  CMake Warning:
    Manually-specified variables were not used by the project:

      CMAKE_ASM_COMPILER
      CMAKE_ASM_FLAGS

  make[2]: warning: -j36 forced in submake: resetting jobserver mode.
  c++: fatal error: Killed signal terminated program cc1plus
  compilation terminated.
  make[5]: *** [CMakeFiles/grpc.dir/build.make:115: CMakeFiles/grpc.dir/src/core/ext/filters/client_channel/client_channel.cc.o] Error 1
  make[5]: *** Waiting for unfinished jobs....
  c++: fatal error: Killed signal terminated program cc1plus
  compilation terminated.
  make[5]: *** [CMakeFiles/grpc.dir/build.make:128: CMakeFiles/grpc.dir/src/core/ext/filters/client_channel/client_channel_channelz.cc.o] Error 1
  c++: fatal error: Killed signal terminated program cc1plus
  compilation terminated.
  make[5]: *** [CMakeFiles/grpc.dir/build.make:180: CMakeFiles/grpc.dir/src/core/ext/filters/client_channel/dynamic_filters.cc.o] Error 1
  c++: fatal error: Killed signal terminated program cc1plus
  compilation terminated.
  make[5]: *** [CMakeFiles/grpc.dir/build.make:323: CMakeFiles/grpc.dir/src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb_channel_secure.cc.o] Error 1
  c++: fatal error: Killed signal terminated program cc1plus
  compilation terminated.
  make[5]: *** [CMakeFiles/grpc.dir/build.make:102: CMakeFiles/grpc.dir/src/core/ext/filters/client_channel/channel_connectivity.cc.o] Error 1
  c++: fatal error: Killed signal terminated program cc1plus
  compilation terminated.
  make[5]: *** [CMakeFiles/grpc.dir/build.make:284: CMakeFiles/grpc.dir/src/core/ext/filters/client_channel/lb_policy/grpclb/client_load_reporting_filter.cc.o] Error 1
  c++: fatal error: Killed signal terminated program cc1plus
  compilation terminated.
  make[5]: *** [CMakeFiles/grpc.dir/build.make:219: CMakeFiles/grpc.dir/src/core/ext/filters/client_channel/http_connect_handshaker.cc.o] Error 1
  c++: fatal error: Killed signal terminated program cc1plus
  compilation terminated.
  make[5]: *** [CMakeFiles/grpc.dir/build.make:89: CMakeFiles/grpc.dir/src/core/ext/filters/client_channel/backup_poller.cc.o] Error 1
  c++: fatal error: Killed signal terminated program cc1plus
  compilation terminated.
  make[5]: *** [CMakeFiles/grpc.dir/build.make:297: CMakeFiles/grpc.dir/src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc.o] Error 1
  c++: fatal error: Killed signal terminated program cc1plus
  compilation terminated.
  make[5]: *** [CMakeFiles/grpc.dir/build.make:388: CMakeFiles/grpc.dir/src/core/ext/filters/client_channel/lb_policy/ring_hash/ring_hash.cc.o] Error 1
  c++: fatal error: Killed signal terminated program cc1plus
  compilation terminated.
  make[5]: *** [CMakeFiles/grpc.dir/build.make:63: CMakeFiles/grpc.dir/src/core/ext/filters/census/grpc_context.cc.o] Error 1
  make[4]: *** [CMakeFiles/Makefile2:1064: CMakeFiles/grpc.dir/all] Error 2
  make[3]: *** [CMakeFiles/Makefile2:1071: CMakeFiles/grpc.dir/rule] Error 2
  make[2]: *** [Makefile:229: grpc] Error 2
  thread 'main' panicked at '
  command did not execute successfully, got: exit status: 2
BusyJay commented 2 years ago

How about cargo build -j 4? I think it's controlled by NUM_JOBS and cmake-rs will read the variable. cargo overwrites NUM_JOBS variable for build script, so setting it manually won't work.

purew commented 2 years ago

That worked, thanks!