llvm / llvm-zorg

Other
65 stars 94 forks source link

Unclear buildbot failure email from clang-cuda-l4 #211

Closed jayfoad closed 3 months ago

jayfoad commented 3 months ago

I got this buildbot failure email:

The Buildbot has detected a new failure on builder clang-cuda-l4 while building llvm.

Full details are available at:
    https://lab.llvm.org/buildbot/#/builders/101/builds/364

Worker for this Build: cuda-l4-0
Blamelist:
    Jay Foad <jay.foad@amd.com>,
    Shengchen Kan <shengchen.kan@intel.com>,
    Stephen Tozer <stephen.tozer@sony.com>

BUILD FAILED: failed '/buildbot/cuda-build --jobs=' (failure)

Step 3 (annotate) failure: '/buildbot/cuda-build --jobs=' (failure)
...
  NV_LIBCUBLAS_PACKAGE_NAME=libcublas-12-2
  NV_LIBCUBLAS_VERSION=12.2.5.6-1
  NV_LIBCUSPARSE_VERSION=12.1.2.141-1
  NV_LIBNCCL_PACKAGE=libnccl2=2.18.5-1+cuda12.2
  NV_LIBNCCL_PACKAGE_NAME=libnccl2
  NV_LIBNCCL_PACKAGE_VERSION=2.18.5-1
  NV_LIBNPP_PACKAGE=libnpp-12-2=12.2.1.4-1
  NV_LIBNPP_VERSION=12.2.1.4-1
  NV_NVTX_VERSION=12.2.140-1
  PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/buildbot
  PWD=/buildbot/cuda-l4-0/work/cuda-l4-0/clang-cuda-l4/build
  SHLVL=1
  TERM=dumb
  WORK_DIR=/buildbot/cuda-l4-0/work
  _=/usr/local/bin/buildbot-worker
 using PTY: False
++ echo @@@HALT_ON_FAILURE@@@
++ readlink -f ..
+ buildbot_dir=/buildbot/cuda-l4-0/work/cuda-l4-0/clang-cuda-l4
+ revision=919c547130cfd1cd75ccf148cbf2334b27b2f37f
+ GPU_ARCH=sm_89
+ CUDA_TEST_JOBS=1
+ build_base=/buildbot/cuda-l4-0/work/clang-cuda-l4
+ mkdir -p /buildbot/cuda-l4-0/work/clang-cuda-l4
+ build_dir=/buildbot/cuda-l4-0/work/clang-cuda-l4/build
+ libc_build_dir=/buildbot/cuda-l4-0/work/clang-cuda-l4/build-libc
+ clang_dir=/buildbot/cuda-l4-0/work/clang-cuda-l4/clang
+ testsuite_dir=/buildbot/cuda-l4-0/work/clang-cuda-l4/llvm-test-suite
+ llvm_src_dir=/buildbot/cuda-l4-0/work/clang-cuda-l4/llvm
+ ext_dir=/buildbot/cuda-l4-0/work/clang-cuda-l4/external
+ inner_pid=342838
+ do_build_and_test
+ trap 'handle_termination $inner_pid' TERM
+ wait 342838
+ fetch_prebuilt_clang 919c547130cfd1cd75ccf148cbf2334b27b2f37f /buildbot/cuda-l4-0/work/clang-cuda-l4/clang
+ local revision=919c547130cfd1cd75ccf148cbf2334b27b2f37f
+ local destdir=/buildbot/cuda-l4-0/work/clang-cuda-l4/clang
+ local 'timeout=10 minutes'
++ date -ud '10 minutes' +%s
+ local endtime=1718876716
++ storage_location llvm-919c547130cfd1cd75ccf148cbf2334b27b2f37f
++ local file=llvm-919c547130cfd1cd75ccf148cbf2334b27b2f37f
++ local default_storage_prefix=gs://cudabot-gce-artifacts/
++ echo gs://cudabot-gce-artifacts/llvm-919c547130cfd1cd75ccf148cbf2334b27b2f37f
+ local snapshot=gs://cudabot-gce-artifacts/llvm-919c547130cfd1cd75ccf148cbf2334b27b2f37f
+ step 'Waiting for LLVM & Clang snapshot to be built. '
+ local 'name=Waiting for LLVM & Clang snapshot to be built. '
+ local summary=
+ echo '@@@BUILD_STEP Waiting for LLVM & Clang snapshot to be built. @@@'
+ step_summary_clear

Sincerely,
LLVM Buildbot

The email does not explain why the build failed. If I look into the logs, I see things like:

FAIL: test-suite :: External/CUDA/cmath-cuda-11.8-c++11-libc++.test (5 of 12)
******************** TEST 'test-suite :: External/CUDA/cmath-cuda-11.8-c++11-libc++.test' FAILED ********************
/buildbot/cuda-l4-0/work/clang-cuda-l4/build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/Output/cmath-cuda-11.8-c++11-libc++.test.out --redirect-input /dev/null --summary /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/Output/cmath-cuda-11.8-c++11-libc++.test.time /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/cmath-cuda-11.8-c++11-libc++
cd /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA ; /buildbot/cuda-l4-0/work/clang-cuda-l4/build/tools/fpcmp-target /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/Output/cmath-cuda-11.8-c++11-libc++.test.out cmath.reference_output-cuda-11.8-c++11-libc++
+ cd /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA
+ /buildbot/cuda-l4-0/work/clang-cuda-l4/build/tools/fpcmp-target /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/Output/cmath-cuda-11.8-c++11-libc++.test.out cmath.reference_output-cuda-11.8-c++11-libc++
/buildbot/cuda-l4-0/work/clang-cuda-l4/build/tools/fpcmp-target: Comparison failed, textual difference between 'C' and 'S'

and

Failed Tests (8):
  test-suite :: External/CUDA/algorithm-cuda-11.8-c++11-libc++.test
  test-suite :: External/CUDA/assert-cuda-11.8-c++11-libc++.test
  test-suite :: External/CUDA/axpy-cuda-11.8-c++11-libc++.test
  test-suite :: External/CUDA/cmath-cuda-11.8-c++11-libc++.test
  test-suite :: External/CUDA/complex-cuda-11.8-c++11-libc++.test
  test-suite :: External/CUDA/math_h-cuda-11.8-c++11-libc++.test
  test-suite :: External/CUDA/new-cuda-11.8-c++11-libc++.test
  test-suite :: External/CUDA/printf-cuda-11.8-c++11-libc++.test
jayfoad commented 3 months ago
slydiman commented 3 months ago

Fixed by this patch https://github.com/llvm/llvm-zorg/pull/209

gkistanova commented 3 months ago

Thanks for reporting this, Jay! This has been fixed. Feel free to reopen if you still see this.