Closed esharkwang closed 1 year ago
After some search, I added the --spawn_strategy=local to use local source code. It won't report error. But I will failed to compile the dependency code for ARMv8 aarch64 code.
bazel build -c opt fleetbench/tcmalloc:all --spawn_strategy=local --sandbox_debug
INFO: Analyzed 8 targets (0 packages loaded, 0 targets configured).
INFO: Found 8 targets...
ERROR: /root/.cache/bazel/_bazel_root/0bce1989468318c371f4348e6ac4d902/external/com_google_tcmalloc/tcmalloc/BUILD:297:11: Compiling tcmalloc/global_stats.cc failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 32 arguments skipped)
In file included from external/com_google_tcmalloc/tcmalloc/cpu_cache.h:39,
from external/com_google_tcmalloc/tcmalloc/global_stats.cc:21:
external/com_google_tcmalloc/tcmalloc/internal/percpu_tcmalloc.h: In function 'int tcmalloc::tcmalloc_internal::subtle::percpu::TcmallocSlab_Internal_Push(typename tcmalloc::tcmalloc_internal::subtle::percpu::TcmallocSlab
I checked the code external/com_google_tcmalloc/tcmalloc/internal/percpu_tcmalloc.h:66. It is actually the asm code area. I am not sure why it would fail. It should be verified before. Is any special options for aarch64 compilation?
"b.le %l[overflow_label]\n"
"b.le 5f\n"
// Important! code below this must not affect any flags (i.e.: ccle) // If so, the above code needs to explicitly set a ccle return value.
"str %[item], [%[region_start], %[current], LSL #3]\n"
"add %w[current], %w[current], #1\n"
"strh %w[current], [%[region_start], %[size_class_lsl3]]\n"
// Commit
"5:\n"
: [end_ptr] "=&r"(end_ptr), [cpu_id] "=&r"(cpu_id),
[current] "=&r"(current), [end] "=&r"(end),
[region_start] "=&r"(region_start)
Hi, @esharkwang,
I'm able to reproduce the same error on a Nvidia Jetson Xavier AGX machine. It turns out this is likely a dependency issue and unrelate to Fleetbench code itself. There are some incompatibilities between the internal and external versions. I am actively looking at it and speaking with TcMalloc team as well.
In the meanwhile, you can try to build with different compilers/compiler version? For example, CC=clang bazel run -c opt fleetbench/swissmap:hot_swissmap_benchmark
.
I will keep you posted once I have any update.
Hi, @esharkwang,
Unfortunately, this is a long-standing issue when build with Bazel 5.4.0 on aarch64 with GCC version < 10, and it is unsupported at this moment.
Hi @liyuying0000 Thanks for the comments. Could the fleetbenct code support GCC 11? If so, i think I could try to upgrade gcc version of bazel 5.4.0 as a workaround. Is it possible?
@liyuying0000 I had tried to use Bazel 6.0.0 with workaround to fix dependency issue. I also raised the gcc to version 11. Now I can build the binary for aarch64. I will give a summary how to work around the issue later.
Hi, @esharkwang Thanks for the updates. I'm so glad it worked out! It would be appreciated if you could provide the work around.
Hi, @liyuying0000 ,
Here is my steps to workaround the issuel.
Thanks so much for your workaround! @esharkwang
Hi,
I want to create aarch64 version fleetbench. However it failed as no permission.
Here is the build log. I had granted the fleetbench folder as 777. bazel run -c opt fleetbench/swissmap:hot_swissmap_benchmark --verbose_failures 2023/01/09 03:40:53 Downloading https://releases.bazel.build/5.4.0/release/bazel-5.4.0-linux-arm64... Extracting Bazel installation... Starting local Bazel server and connecting to it... INFO: Analyzed target //fleetbench/swissmap:hot_swissmap_benchmark (65 packages loaded, 836 targets configured). INFO: Found 1 target... ERROR: /home/nvidia/walter/fleetbench/fleetbench/BUILD:15:11: Compiling fleetbench/benchmark_main.cc failed: (Exit 1): gcc failed: error executing command (cd /root/.cache/bazel/_bazel_root/0bce1989468318c371f4348e6ac4d902/sandbox/linux-sandbox/15/execroot/com_google_fleetbench && \ exec env - \ PATH=/root/.cache/bazelisk/downloads/bazelbuild/bazel-5.4.0-linux-arm64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \ PWD=/proc/self/cwd \ /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++0x' -MD -MF bazel-out/aarch64-opt/bin/fleetbench/_objs/benchmark_main/benchmark_main.d '-frandom-seed=bazel-out/aarch64-opt/bin/fleetbench/_objs/benchmark_main/benchmark_main.o' -DBENCHMARK_STATIC_DEFINE -iquote . -iquote bazel-out/aarch64-opt/bin -iquote external/com_google_benchmark -iquote bazel-out/aarch64-opt/bin/external/com_google_benchmark -Ibazel-out/aarch64-opt/bin/external/com_google_benchmark/_virtual_includes/benchmark '-std=c++17' -fno-canonical-system-headers -Wno-builtin-macro-redefined '-DDATE="redacted"' '-DTIMESTAMP="redacted"' '-DTIME="redacted"' -c fleetbench/benchmark_main.cc -o bazel-out/aarch64-opt/bin/fleetbench/_objs/benchmark_main/benchmark_main.o)
Configuration: a0b0f0a2e12d5d8ebd5c1e57a8b5134db01aaef167d6db5c638a140b29cfa08a
Execution platform: @local_config_platform//:host
Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging gcc: error: fleetbench/benchmark_main.cc: Permission denied gcc: fatal error: no input files compilation terminated. Target //fleetbench/swissmap:hot_swissmap_benchmark failed to build INFO: Elapsed time: 17.432s, Critical Path: 1.09s INFO: 170 processes: 166 internal, 4 linux-sandbox. FAILED: Build did NOT complete successfully FAILED: Build did NOT complete successfully root@nvidia:/home/nvidia/walter/fleetbench# bazel version Bazelisk version: v1.13.2 Build label: 5.4.0 Build target: bazel-out/aarch64-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar Build time: Thu Dec 15 16:14:11 2022 (1671120851) Build timestamp: 1671120851 Build timestamp as int: 1671120851
I did some researches and found that it was caused by a loop soft link. The link didn't point to the correct source file. It pointed to itself. Should I missed some build options or configurations?