sharkdp / hyperfine

A command-line benchmarking tool
Apache License 2.0
22.26k stars 358 forks source link

max-time-per-run #576

Open christianhujer opened 2 years ago

christianhujer commented 2 years ago

For benchmarking, we already have -P, --parameter-scan, and its more flexible counterpart -L, --parameter-list, and that's great. As the example shows, we can use this to run hyperfine like this:

hyperfine -p 'make clean' -P threads 1 8 'make -j {threads}'

I've now found a use case where a possibility to have hyperfine limit the benchmarking runs based on elapsed time could be useful.

hyperfine -L dir 'C,Rust,bash' -L N 10,20,30,40 'make -C {dir} fibonacci-recursive-benchmark N={N}'

For benchmarks where performance varies greatly, like between C and bash, it could occasionally be useful to present results as "aborted (took too long)" by having a --max-time-per-run <TIME> argument, for example --max-time-per-run 2s, that will automatically terminate a run and its associated benchmark when its runs take longer than --max-time-per-run. The values could be output as .

sharkdp commented 2 years ago

Thank you for your request.

I think the implementation of such a feature would require a LOT of special cases downstream to properly handle the absence of values.

Have you considered using sth like timeout to limit the time (in combination with hyperfines --ignore-failure option to ignore the nonzero exit code)? That would not show sth. like aborted or infinity, but it will run into the time limit and show that:

▶ hyperfine --ignore-failure -L time 1,5 'timeout 2 sleep {time}' 
Benchmark 1: timeout 2 sleep 1
  Time (mean ± σ):      1.002 s ±  0.000 s    [User: 0.001 s, System: 0.002 s]
  Range (min … max):    1.001 s …  1.002 s    10 runs

Benchmark 2: timeout 2 sleep 5
  Time (mean ± σ):      2.001 s ±  0.000 s    [User: 0.002 s, System: 0.001 s]
  Range (min … max):    2.001 s …  2.002 s    10 runs

  Warning: Ignoring non-zero exit code.

Summary
  'timeout 2 sleep 1' ran
    2.00 ± 0.00 times faster than 'timeout 2 sleep 5'

In general: why would you be interested in including benchmarks that would potentially run into a time limit?

sharkdp commented 2 years ago

see also: #106

christianhujer commented 2 years ago

To answer the question "why would I be interested in including benchmarks that would potentially run into a time limit?"

I am benchmarking a matrix of languages and programs automatically. From my Makefile:

ALL:=$(patsubst %/,%,$(filter-out \
    asm-m68k-amiga-gasm/ \
    asm-m68k-amiga-masm/ \
    asm-m68k-amiga2-masm/ \
    Carbon/ \
    Concurnas/ \
    Logo/ \
    , $(wildcard */)))

.PHONY: hyperfine-roundtrip
hyperfine-roundtrip: hyperfine-roundtrip.csv
hyperfine-roundtrip.csv:
    hyperfine --export-csv hyperfine-roundtrip.csv -L variant $(shell echo $(ALL) | sed -e 's/ /,/g') -p 'make -C {variant} clean' 'make -sC {variant}'

I think you can see how well hyperfine works for this case. ❤️

Before hyperfine, my Makefile looked like this:

.PHONY: time-%
time-%:
    @for ((i = 0; i < 10; i++)); do
    @$(MAKE) -s -C $* clean 2>&1
    @start=$$(date -u +'%s%N')
    @$(MAKE) -s -C $* >/dev/null 2>&1
    @end=$$(date -u +'%s%N')
    @echo '$*,'$$(($$end - $$start))
    @done

time.csv:
    echo 'Language,time (ns)' >$@
    $(MAKE) -s time >>$@

clean::
    $(RM) time.csv

time-processed.csv: time.csv
    sqlite3 >>$@ <<END
    .mode csv
    .import time.csv times
    select "Language", ((1.0 * sum("time (ns)") - max("time (ns)") - min("time (ns)")) / (count("time (ns)") - 2.0)) / 1000000 as "Time (ms)" from times group by "Language" order by "Time (ms)";
    END

Using timeout as a wrapper will work from a functional perspective. The measurement would no longer be just the target program, but timeout plus the target program. One would have to benchmark timeout itself also and then subtract that value, by first running a benchmark on true, then running a benchmark on timeout true and subtracting the benchmark of true from it. That's why having this feature in hyperfine itself would be great.

For a lot of purposes, timeout will work fine, this feature is not essential. It only matters where some of the results will be so low/fast that the time it takes to run timeout (I guess 3-6ms) makes a significant difference.

(I'm measuring roundtrip times of programming languages, and they can range from a few ms in Perl or Assembler to many seconds like Flix, and it also heavily depends on the problem statement.)

sharkdp commented 1 year ago

For a lot of purposes, timeout will work fine, this feature is not essential. It only matters where some of the results will be so low/fast that the time it takes to run timeout (I guess 3-6ms) makes a significant difference.

Don't guess - measure :smile:

Command Mean [ms] Min [ms] Max [ms] Relative
fd 12.2 ± 0.9 10.5 14.7 1.00
timeout 2s fd 12.9 ± 0.8 11.1 15.7 1.06 ± 0.10

The overhead seems to be below 1 ms.