mirage / bechamel

Agnostic benchmark in OCaml (proof-of-concept)
MIT License
44 stars 15 forks source link

CI95 always has exact interval #42

Closed edwintorok closed 10 months ago

edwintorok commented 1 year ago

I tried using a non-zero 'bootstrap' argument, but that resulted in exact intervals, which can't be right, the measurement results aren't that accurate...

e.g.

{ monotonic-clock per run = 356.061045 (confidence:
                                                  356.061045 to 356.061045);
                                                  r² = Some 0.992947 }

Looking at the code it would seem it computes the ols result multiple times with the same input, which unsurprisingly always yields the same output. There is some code there to randomize an index array, but 'indices' is completely unused.

IIUC bootstrap is meant to extract N random samples from the entire set of measurements, and repeating this process 'trials' times would lead to slightly different OLS results, out of which you pick the results according to the confidence interval. The code is missing the 'sampling' part.

'ransac.ml' has a 'random_partition' function which seems close to what we need though.

        let bootstrap_fails = ref 0 in
        let indices = Array.make (Array.length m) 0 in
        let bootstrap_coeffs = Array.init p (fun _ -> Array.make trials 0.0) in
        for i = 0 to trials - 1 do
          random_indices_in_place indices ~max:(Array.length m);
          let matrix, vector = make_lr_inputs ~responder ~predictors m in
          match Linear_algebra.ols ~in_place:true matrix vector with
          | Ok bt_ols_result ->
              for p = 0 to p - 1 do
                bootstrap_coeffs.(p).(i) <- bt_ols_result.(p)
              done

Perhaps something like this is needed:

diff --git a/lib/analyze.ml b/lib/analyze.ml
index 71f5c201..04e41df1 100644
--- a/lib/analyze.ml
+++ b/lib/analyze.ml
@@ -95,7 +95,8 @@ module OLS = struct
         let bootstrap_coeffs = Array.init p (fun _ -> Array.make trials 0.0) in
         for i = 0 to trials - 1 do
           random_indices_in_place indices ~max:(Array.length m);
-          let matrix, vector = make_lr_inputs ~responder ~predictors m in
+          let m' = Array.mapi (fun i _ -> m.(indices.(i))) m in
+          let matrix, vector = make_lr_inputs ~responder ~predictors m' in
           match Linear_algebra.ols ~in_place:true matrix vector with
           | Ok bt_ols_result ->
               for p = 0 to p - 1 do

Which prints some plausibly looking values now:

{ monotonic-clock per run = 315.996865 (confidence:
                                                  318.428739 to 313.286506);
                                                  r² = Some 0.995618 }

Although probably best to avoid allocating a new array every iteration and pass the indices to 'make_lr_inputs' as an optional arg instead?

lindig commented 11 months ago

I think this needs attention as the issue undermines the purpose of this library to measure performance. Somewhat unrelated I would like to see the unit of the monotonic clock better better documented.

dinosaure commented 11 months ago

I can confirm yourI confirm your observation as well as your patch :+1: Thank you for taking the time to see where the error was, I have just proposed a fix here: #45. Can you confirm that this fix solves the original problem? For my part, I've done a few tests about the sqrt example and I'm getting good results.

dinosaure commented 11 months ago

Somewhat unrelated I would like to see the unit of the monotonic clock better better documented.

@lindig: Actually, we can precise which clock we use for which system. For instance, for Linux, we use CLOCK_MONOTONIC which does not count suspension time of the system (despite CLOCK_BOOTTIME). If you want, we can improve the documentation about that, WDYT?

EDIT: note that you can introduce your metric which can be another monotonic clock. I mean, that's the initial goal of bechamel. Instead of imposing metrics that may not correspond to the user, we prefer to allow you to create your own metrics (such as using bechamel-perf & perf - only for Linux) to get results that correspond to you better.

lindig commented 11 months ago

All I was thinking about is presenting the unit (ns, us) in the resulting tables. Otherwise there are just big numbers that could be anything.

dinosaure commented 11 months ago

Ah yes, currently you can choose the unit you want depending on the metrics and decide how to show results. I mean, bechamel has currently 3 outputs (and you can implement one more if you want): 1) a CLI output with notty 2) a HTML one which is my favorite (you can see an example here 3) and basic one which can be "easy" to manipulate (for instance to produce a JSON output)

We definitely can improve them and - actually, as you said, the HTML output does not show units - but yeah some mechanims exists to show units of metrics :+1:

dinosaure commented 10 months ago

@edwintorok a gently ping to check my PR 🙂

edwintorok commented 10 months ago

Sorry, the original notification must've gotten lost in my inbox, taking a look now.