knurling-rs / defmt

Efficient, deferred formatting for logging on embedded systems
https://defmt.ferrous-systems.com/
Apache License 2.0
794 stars 71 forks source link

Feature request for defmt-test: #[bench] #555

Open Sh3Rm4n opened 3 years ago

Sh3Rm4n commented 3 years ago

Support some kind of benchmarking. I'm imaging something as simple as running a monotonic timer, which starts at function entry and stops at function exit and then in the test execution the difference off the tick count could be printed.

This would involve some setup code and I see, that std's #[bench] attribute is not stable, so I don't know if it is more useful to include a new #[bench] attribute or just re-use the #[test] branch where the execution time is calculated based on the bench profile or an extra flag?

The DWT (Data Watchpoint and Trace unit) for example could be setup in #[init] manually or maybe even "automatically" / hidden from the user inside the proc-macro expanded main function.

This would be an easy "flip the switch" solution to measure execution times of a tested function (though it might not be the best way to measure the execution, as some setup code is needed most of the time to get a test up and running, and this code would go into the measurement at all).

I never seriously bench-marked anything via the std-lib benchmark, so I'm not too sure about the ergonomics.

This is definitely low priority, and can all be done manually right now. But I came into my mind tonight, and I wanted to share it. What do you think?

japaric commented 3 years ago

Definitively an interesting idea!

The #[bench] functionality in the standard test runner runs the code multiple times (hundreds or ten thousands times) to get a more precise measurement. This is needed because the code runs on an OS and competes with other processes so each execution of the function takes a different amount of (wall clock) time. (using perf counters instead of wall clock time would eliminate most of those issues and serve as a better metric for "amount of work done" but that's a different conversation). #[bench] has logic to run the code "enough number of times" to get a statistically significant sample from which it then computes mean and standard deviation.

I don't think we need any/most of that logic in defmt-test because unit tests run on bare metal (effectively as a single process) -- and the same would apply to "benchmarks". IME, running code several times pretty much returns the same number of clock cycles (as reported by DWT.CYCCNT).

Given that measurements are reliable in bare metal code, what may be interesting to add here is assert that the function under test executes within some specified amount of time. Example syntax:

// panics if function execution takes longer than `1_000_000` clock cycles
#[bench(upper_bound_execution_time = 1_000_000)] 
fn do_some_math(black_boxed_input: f32) {
    // do some trigonometric math here
}

IMO, in a test suite this feels more helpful than a benchmark that prints some results as it asserts something. I can see the use case for just printing benchmark results too but that could as well be a binary crate that does:

fn main() -> ! {
    setup();

    // times the closure and prints the result to the console
    benchmark::function(|| {
        // do stuff here
    });

    // .. more benchmarks here ..

    exit()
}

I never seriously bench-marked anything via the std-lib benchmark, so I'm not too sure about the ergonomics.

something that you can do with #[bench] and the standard test-runner is filter out benchmarks by passing a command-line argument. We don't support filtering tests in defmt-test (yet?)

jonas-schievink commented 3 years ago

It might make sense to mirror something like criterion here, which plugs into Cargo's bench targets just like the built-in benchmark runner. That way, you get a correctly configured Cargo profile by default and can use cargo bench to run them.

Sh3Rm4n commented 3 years ago

Given that measurements are reliable in bare metal code, what may be interesting to add here is assert that the function under test executes within some specified amount of time.

Yeah, this is definitely a usecase I was thinking about as well, but I did forget to mention it in the OP. This is great to catch a regression in the execution time caused by some code changes in the tested code paths.

japaric commented 2 years ago

[RFC] benchmarking on embedded systems

I think it may be a good time to revisit this idea given some recent developments. Some things that came to mind while re-reading this thread.

now that asm! is in route for stabilization we can implement black_box out-of-tree on stable like this:

fn black_box<T>(dummy: T) -> T {
    unsafe { asm!("/* {0} */", in(reg) &dummy) }
    dummy
}

(black_box is important to prevent the compiler optimizing away the work you want to measure)

regarding statistics and other mentions of criterion, what may be interesting is measuring how execution time changes based on the input. for example, if you want to time memclr for different slice sizes you may write this: (don't mind the syntax much; it's just a sketch)

#[defmt::bench]
#[scratch([0; 256])]
#[inputs([0, 4, 16, 64, 256])]
fn memclr(scratch: &mut [u8; 256], #[input] size: usize) {
    scratch.iter().take(size).for_each(|byte| *byte = 0);
}

(less important: in this example, scratch and size are black-boxed before the function call; scratch is black-boxed after the function call too; measurement is done behind the scenes. it doesn't have to be that way though we could make black-box the responsibility of the benchmark's author)

now, you could do an analysis like a linear regression to compute the slope and the offset and things like that; maybe also produce a graph. what I would really like is to not put such logic in probe-run. instead what I would like to see is probe-run outputting the relevant data so that the analysis can be done out of band. how? using the JSON output format; the schema currently contains a data field which is a formatted string. I think we could put another JSON string there with something along the lines of:

[
  {
    "name": "memclr",
    "measurements": [
      { "input": 0, "time": 20 },
      { "input": 4, "time": 40 },
      // ..
    ]
  },
  // ..
]

then one could do something like cargo bench | third-party-bench-formatter to produce either colored text to the console, or an HTML report. (+)

what I don't know is how one would support both non-json-output and json-output in probe-run in a way that they both look sensible. I think I would opt for only supporting JSON output in this case.

the other annoying bit of requiring JSON output for cargo bench is that the runner would need to be probe-run --json $flags only in that case but there's no way to specify a different Cargo runner for binaries vs tests vs benchmarks via .cargo/config.toml. I guess one could write a cargo xtask for that.

(+) in fact, one could do something similar with cargo test using JSON output and some third party adapter to turn that into either human readable output or junit xml format, etc.