Display walltime benchmarks with subnanosecond precision

the8472 commented 1 week ago

With modern CPUs running at more than one cycle per nanosecond the current precision is insufficient to resolve differences worth several cycles per iteration.

Granted, walltime benchmarks often are noisy but occasionally, especially when no allocations are involved, the difference really is just a few cycles.

example results when benchmarking 1-4 serialized ADD instructions and an empty bench body

running 4 tests
test add  ... bench:           0.24 ns/iter (+/- 0.00)
test add2 ... bench:           0.48 ns/iter (+/- 0.01)
test add3 ... bench:           0.72 ns/iter (+/- 0.01)
test add4 ... bench:           0.96 ns/iter (+/- 0.01)
test empty ... bench:           0.24 ns/iter (+/- 0.00)

rustbot commented 1 week ago

r? @jhpratt

rustbot has assigned @jhpratt. They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

rust-log-analyzer commented 1 week ago

The job x86_64-gnu-llvm-17 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

```plain #16 exporting to docker image format #16 sending tarball 29.4s done #16 DONE 41.7s ##[endgroup] Setting extra environment values for docker: --env ENABLE_GCC_CODEGEN=1 --env GCC_EXEC_PREFIX=/usr/lib/gcc/ [CI_JOB_NAME=x86_64-gnu-llvm-17] --- sccache: Starting the server... ##[group]Configure the build configure: processing command line configure: configure: build.configure-args := ['--build=x86_64-unknown-linux-gnu', '--llvm-root=/usr/lib/llvm-17', '--enable-llvm-link-shared', '--set', 'rust.thin-lto-import-instr-limit=10', '--set', 'change-id=99999999', '--enable-verbose-configure', '--enable-sccache', '--disable-manage-submodules', '--enable-locked-deps', '--enable-cargo-native-static', '--set', 'rust.codegen-units-std=1', '--set', 'dist.compression-profile=balanced', '--dist-compression-formats=xz', '--disable-dist-src', '--release-channel=nightly', '--enable-debug-assertions', '--enable-overflow-checks', '--enable-llvm-assertions', '--set', 'rust.verify-llvm-ir', '--set', 'rust.codegen-backends=llvm,cranelift,gcc', '--set', 'llvm.static-libstdcpp', '--enable-new-symbol-mangling'] configure: target.x86_64-unknown-linux-gnu.llvm-config := /usr/lib/llvm-17/bin/llvm-config configure: llvm.link-shared := True configure: rust.thin-lto-import-instr-limit := 10 configure: change-id := 99999999 --- Downloaded boml v0.3.1 Compiling boml v0.3.1 Compiling y v0.1.0 (/checkout/compiler/rustc_codegen_gcc/build_system) Finished `release` profile [optimized] target(s) in 3.46s Running `/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-codegen/x86_64-unknown-linux-gnu/release/y test --use-system-gcc --use-backend gcc --out-dir /checkout/obj/build/x86_64-unknown-linux-gnu/stage1-tools/cg_gcc --release --mini-tests --std-tests` Using system GCC [BUILD] example [AOT] mini_core_hello_world /checkout/obj/build/x86_64-unknown-linux-gnu/stage1-tools/cg_gcc/mini_core_hello_world abc --- ---- [run-make] tests/run-make/libtest-padding stdout ---- error: make failed status: exit status: 2 command: cd "/checkout/tests/run-make/libtest-padding" && env -u CARGO_MAKEFLAGS -u MAKEFLAGS -u MFLAGS -u RUSTFLAGS AR="ar" CC="cc -ffunction-sections -fdata-sections -fPIC -m64" CXX="c++ -ffunction-sections -fdata-sections -fPIC -m64" HOST_RPATH_DIR="/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" LD_LIB_PATH_ENVVAR="LD_LIBRARY_PATH" LLVM_BIN_DIR="/usr/lib/llvm-17/bin" LLVM_COMPONENTS="aarch64 aarch64asmparser aarch64codegen aarch64desc aarch64disassembler aarch64info aarch64utils aggressiveinstcombine all all-targets amdgpu amdgpuasmparser amdgpucodegen amdgpudesc amdgpudisassembler amdgpuinfo amdgputargetmca amdgpuutils analysis arm armasmparser armcodegen armdesc armdisassembler arminfo armutils asmparser asmprinter avr avrasmparser avrcodegen avrdesc avrdisassembler avrinfo binaryformat bitreader bitstreamreader bitwriter bpf bpfasmparser bpfcodegen bpfdesc bpfdisassembler bpfinfo cfguard codegen codegentypes core coroutines coverage debuginfobtf debuginfocodeview debuginfodwarf debuginfogsym debuginfologicalview debuginfomsf debuginfopdb demangle dlltooldriver dwarflinker dwarflinkerparallel dwp engine executionengine extensions filecheck frontendhlsl frontendopenacc frontendopenmp fuzzercli fuzzmutate globalisel hexagon hexagonasmparser hexagoncodegen hexagondesc hexagondisassembler hexagoninfo instcombine instrumentation interfacestub interpreter ipo irprinter irreader jitlink lanai lanaiasmparser lanaicodegen lanaidesc lanaidisassembler lanaiinfo libdriver lineeditor linker loongarch loongarchasmparser loongarchcodegen loongarchdesc loongarchdisassembler loongarchinfo lto m68k m68kasmparser m68kcodegen m68kdesc m68kdisassembler m68kinfo mc mca mcdisassembler mcjit mcparser mips mipsasmparser mipscodegen mipsdesc mipsdisassembler mipsinfo mirparser msp430 msp430asmparser msp430codegen msp430desc msp430disassembler msp430info native nativecodegen nvptx nvptxcodegen nvptxdesc nvptxinfo objcarcopts objcopy object objectyaml option orcjit orcshared orctargetprocess passes perfjitevents powerpc powerpcasmparser powerpccodegen powerpcdesc powerpcdisassembler powerpcinfo profiledata remarks riscv riscvasmparser riscvcodegen riscvdesc riscvdisassembler riscvinfo riscvtargetmca runtimedyld scalaropts selectiondag sparc sparcasmparser sparccodegen sparcdesc sparcdisassembler sparcinfo support symbolize systemz systemzasmparser systemzcodegen systemzdesc systemzdisassembler systemzinfo tablegen target targetparser textapi transformutils ve veasmparser vecodegen vectorize vedesc vedisassembler veinfo webassembly webassemblyasmparser webassemblycodegen webassemblydesc webassemblydisassembler webassemblyinfo webassemblyutils windowsdriver windowsmanifest x86 x86asmparser x86codegen x86desc x86disassembler x86info x86targetmca xcore xcorecodegen xcoredesc xcoredisassembler xcoreinfo xray xtensa xtensaasmparser xtensacodegen xtensadesc xtensadisassembler xtensainfo" LLVM_FILECHECK="/usr/lib/llvm-17/bin/FileCheck" NODE="/usr/bin/node" PYTHON="/usr/bin/python3" RUSTC="/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" RUSTDOC="/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustdoc" RUST_BUILD_STAGE="stage2-x86_64-unknown-linux-gnu" S="/checkout" TARGET="x86_64-unknown-linux-gnu" TARGET_RPATH_DIR="/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" TMPDIR="/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/libtest-padding/libtest-padding" "make" --- stdout ------------------------------- LD_LIBRARY_PATH="/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/libtest-padding/libtest-padding:/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib:/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/lib" '/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/libtest-padding/libtest-padding -L /checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/libtest-padding/libtest-padding -Ainternal_features --test tests.rs LD_LIBRARY_PATH="/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/libtest-padding/libtest-padding:/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib:/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/lib" /checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/libtest-padding/libtest-padding/tests --test-threads=1 | sed 's%[0-9,]\{1,\} ns/iter (+/- [0-9,]\{1,\})%?? ns/iter (+/- ??)%' | sed 's%finished in [0-9\.]\{1,\}%finished in ??%' > "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/libtest-padding/libtest-padding"/test.stdout diff -u --strip-trailing-cr "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/libtest-padding/libtest-padding"/test.stdout test.stdout LD_LIBRARY_PATH="/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/libtest-padding/libtest-padding:/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib:/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/lib" /checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/libtest-padding/libtest-padding/tests --test-threads=1 --bench | sed 's%[0-9,]\{1,\} ns/iter (+/- [0-9,]\{1,\})%?? ns/iter (+/- ??)%' | sed 's%finished in [0-9\.]\{1,\}%finished in ??%' > "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/libtest-padding/libtest-padding"/bench.stdout diff -u --strip-trailing-cr "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/libtest-padding/libtest-padding"/bench.stdout bench.stdout --- /checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/libtest-padding/libtest-padding/bench.stdout 2024-05-05 21:54:37.328212865 +0000 +++ bench.stdout 2024-05-05 21:14:15.977246047 +0000 @@ -2,8 +2,8 @@ test short_test_name ... ignored test this_is_a_really_long_test_name ... ignored test this_is_a_really_long_test_name ... ignored -test short_bench_name ... bench: 4.63 ns/iter (+/- 1.05) -test this_is_a_really_long_bench_name ... bench: 4.79 ns/iter (+/- 0.89) +test short_bench_name ... bench: ?? ns/iter (+/- ??) +test this_is_a_really_long_bench_name ... bench: ?? ns/iter (+/- ??) test result: ok. 0 passed; 0 failed; 2 ignored; 2 measured; 0 filtered out; finished in ??s ------------------------------------------ --- stderr ------------------------------- make: *** [Makefile:12: all] Error 1 ```

rustbot commented 1 week ago

Some changes occurred in run-make tests.

cc @jieyouxu

jhpratt commented 1 week ago

Do you know if there's a reason that it's always calculated in nanoseconds? Being most familiar with criterion (as are many others), I suspect that displaying ms, µs, ns, and ps as appropriate is reasonable.

the8472 commented 1 week ago

Switching units when doing a before/after comparison would makes thing more difficult to eyeball.

jhpratt commented 1 week ago

@bors r+

bors commented 1 week ago

:pushpin: Commit 2a7c42f93c2e3e3c07a8cae6453e4c7626e83b51 has been approved by jhpratt

It is now in the queue for this repository.

bors commented 1 week ago

:hourglass: Testing commit 2a7c42f93c2e3e3c07a8cae6453e4c7626e83b51 with merge e93f34210120d1fc6a0195d527eee828a36e57b1...

bors commented 1 week ago

:sunny: Test successful - checks-actions Approved by: jhpratt Pushing e93f34210120d1fc6a0195d527eee828a36e57b1 to master...

rust-timer commented 1 week ago

Finished benchmarking commit (e93f34210120d1fc6a0195d527eee828a36e57b1): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment. | | mean | range | count | |:----------------------------------:|:-----:|:--------------:|:-----:| | Regressions ❌
(primary) | - | - | 0 | | Regressions ❌
(secondary) | - | - | 0 | | Improvements ✅
(primary) | -2.7% | [-2.7%, -2.7%] | 1 | | Improvements ✅
(secondary) | - | - | 0 | | All ❌✅ (primary) | -2.7% | [-2.7%, -2.7%] | 1 |

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 674.951s -> 674.435s (-0.08%) Artifact size: 315.92 MiB -> 315.80 MiB (-0.04%)

rust-lang / rust