google / fleetbench

Benchmarking suite for Google workloads
Apache License 2.0
111 stars 9 forks source link

Build fleetbench failed as no permission with Bazel 5.4.0 #6

Closed esharkwang closed 1 year ago

esharkwang commented 1 year ago

Hi,

I want to create aarch64 version fleetbench. However it failed as no permission.

Here is the build log. I had granted the fleetbench folder as 777. bazel run -c opt fleetbench/swissmap:hot_swissmap_benchmark --verbose_failures 2023/01/09 03:40:53 Downloading https://releases.bazel.build/5.4.0/release/bazel-5.4.0-linux-arm64... Extracting Bazel installation... Starting local Bazel server and connecting to it... INFO: Analyzed target //fleetbench/swissmap:hot_swissmap_benchmark (65 packages loaded, 836 targets configured). INFO: Found 1 target... ERROR: /home/nvidia/walter/fleetbench/fleetbench/BUILD:15:11: Compiling fleetbench/benchmark_main.cc failed: (Exit 1): gcc failed: error executing command (cd /root/.cache/bazel/_bazel_root/0bce1989468318c371f4348e6ac4d902/sandbox/linux-sandbox/15/execroot/com_google_fleetbench && \ exec env - \ PATH=/root/.cache/bazelisk/downloads/bazelbuild/bazel-5.4.0-linux-arm64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \ PWD=/proc/self/cwd \ /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++0x' -MD -MF bazel-out/aarch64-opt/bin/fleetbench/_objs/benchmark_main/benchmark_main.d '-frandom-seed=bazel-out/aarch64-opt/bin/fleetbench/_objs/benchmark_main/benchmark_main.o' -DBENCHMARK_STATIC_DEFINE -iquote . -iquote bazel-out/aarch64-opt/bin -iquote external/com_google_benchmark -iquote bazel-out/aarch64-opt/bin/external/com_google_benchmark -Ibazel-out/aarch64-opt/bin/external/com_google_benchmark/_virtual_includes/benchmark '-std=c++17' -fno-canonical-system-headers -Wno-builtin-macro-redefined '-DDATE="redacted"' '-DTIMESTAMP="redacted"' '-DTIME="redacted"' -c fleetbench/benchmark_main.cc -o bazel-out/aarch64-opt/bin/fleetbench/_objs/benchmark_main/benchmark_main.o)

Configuration: a0b0f0a2e12d5d8ebd5c1e57a8b5134db01aaef167d6db5c638a140b29cfa08a

Execution platform: @local_config_platform//:host

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging gcc: error: fleetbench/benchmark_main.cc: Permission denied gcc: fatal error: no input files compilation terminated. Target //fleetbench/swissmap:hot_swissmap_benchmark failed to build INFO: Elapsed time: 17.432s, Critical Path: 1.09s INFO: 170 processes: 166 internal, 4 linux-sandbox. FAILED: Build did NOT complete successfully FAILED: Build did NOT complete successfully root@nvidia:/home/nvidia/walter/fleetbench# bazel version Bazelisk version: v1.13.2 Build label: 5.4.0 Build target: bazel-out/aarch64-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar Build time: Thu Dec 15 16:14:11 2022 (1671120851) Build timestamp: 1671120851 Build timestamp as int: 1671120851

I did some researches and found that it was caused by a loop soft link. The link didn't point to the correct source file. It pointed to itself. Should I missed some build options or configurations? image

esharkwang commented 1 year ago

After some search, I added the --spawn_strategy=local to use local source code. It won't report error. But I will failed to compile the dependency code for ARMv8 aarch64 code.

bazel build -c opt fleetbench/tcmalloc:all --spawn_strategy=local --sandbox_debug INFO: Analyzed 8 targets (0 packages loaded, 0 targets configured). INFO: Found 8 targets... ERROR: /root/.cache/bazel/_bazel_root/0bce1989468318c371f4348e6ac4d902/external/com_google_tcmalloc/tcmalloc/BUILD:297:11: Compiling tcmalloc/global_stats.cc failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 32 arguments skipped) In file included from external/com_google_tcmalloc/tcmalloc/cpu_cache.h:39, from external/com_google_tcmalloc/tcmalloc/global_stats.cc:21: external/com_google_tcmalloc/tcmalloc/internal/percpu_tcmalloc.h: In function 'int tcmalloc::tcmalloc_internal::subtle::percpu::TcmallocSlab_Internal_Push(typename tcmalloc::tcmalloc_internal::subtle::percpu::TcmallocSlab::Slabs, size_t, void, tcmalloc::tcmalloc_internal::subtle::percpu::Shift, tcmalloc::tcmalloc_internal::subtle::percpu::OverflowHandler, void*, size_t)': external/com_google_tcmalloc/tcmalloc/internal/percpu_tcmalloc.h:667:9: error: expected ':' or '::' before '[' token 667 | : [end_ptr] "=&r"(end_ptr), [cpu_id] "=&r"(cpu_id), | ^ INFO: Elapsed time: 2.486s, Critical Path: 1.92s INFO: 39 processes: 37 internal, 2 local.

I checked the code external/com_google_tcmalloc/tcmalloc/internal/percpu_tcmalloc.h:66. It is actually the asm code area. I am not sure why it would fail. It should be verified before. Is any special options for aarch64 compilation?

if TCMALLOC_INTERNAL_PERCPU_USE_RSEQ_ASM_GOTO

  "b.le %l[overflow_label]\n"

else

  "b.le 5f\n"

// Important! code below this must not affect any flags (i.e.: ccle) // If so, the above code needs to explicitly set a ccle return value.

endif

  "str %[item], [%[region_start], %[current], LSL #3]\n"
  "add %w[current], %w[current], #1\n"
  "strh %w[current], [%[region_start], %[size_class_lsl3]]\n"
  // Commit
  "5:\n"
  : [end_ptr] "=&r"(end_ptr), [cpu_id] "=&r"(cpu_id),
    [current] "=&r"(current), [end] "=&r"(end),
    [region_start] "=&r"(region_start)
liyuying0000 commented 1 year ago

Hi, @esharkwang,

I'm able to reproduce the same error on a Nvidia Jetson Xavier AGX machine. It turns out this is likely a dependency issue and unrelate to Fleetbench code itself. There are some incompatibilities between the internal and external versions. I am actively looking at it and speaking with TcMalloc team as well.

In the meanwhile, you can try to build with different compilers/compiler version? For example, CC=clang bazel run -c opt fleetbench/swissmap:hot_swissmap_benchmark.

I will keep you posted once I have any update.

liyuying0000 commented 1 year ago

Hi, @esharkwang,

Unfortunately, this is a long-standing issue when build with Bazel 5.4.0 on aarch64 with GCC version < 10, and it is unsupported at this moment.

esharkwang commented 1 year ago

Hi @liyuying0000 Thanks for the comments. Could the fleetbenct code support GCC 11? If so, i think I could try to upgrade gcc version of bazel 5.4.0 as a workaround. Is it possible?

esharkwang commented 1 year ago

@liyuying0000 I had tried to use Bazel 6.0.0 with workaround to fix dependency issue. I also raised the gcc to version 11. Now I can build the binary for aarch64. I will give a summary how to work around the issue later.

liyuying0000 commented 1 year ago

Hi, @esharkwang Thanks for the updates. I'm so glad it worked out! It would be appreciated if you could provide the work around.

esharkwang commented 1 year ago

Hi, @liyuying0000 ,

Here is my steps to workaround the issuel.

  1. First to update the skylib reference as bazel discussion group Here is a sample. http_archive( name = "bazel_skylib", sha256 = "74d544d96f4a5bb630d465ca8bbcfe231e3594e5aae57e1edbf17a6eb3ca2506", urls = [ "https://mirror.bazel.build/github.com/bazelbuild/bazel-skylib/releases/download/1.3.0/bazel-skylib-1.3.0.tar.gz", "https://github.com/bazelbuild/bazel-skylib/releases/download/1.3.0/bazel-skylib-1.3.0.tar.gz", ], )
  2. Specify bazelisk to use bazel 6.0.0 with environment variable. export USE_BAZEL_VERSION=6.0.0
  3. Install the GCC 11 and mark it as the default compiler. apt install -y build-essential apt install -y gcc-11 g++-10 cpp-11 update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 100 --slave /usr/bin/g++ g++ /usr/bin/g++-11 --slave /usr/bin/gcov gcov /usr/bin/gcov-11
  4. Then use bazel to compile and run the application. It can work with aarch64 Ubuntu 20.04 version.
liyuying0000 commented 1 year ago

Thanks so much for your workaround! @esharkwang