mirage / bechamel

Agnostic benchmark in OCaml (proof-of-concept)
MIT License
44 stars 15 forks source link

Benchmarking memory footprint of data structures. #49

Open tiuno opened 3 months ago

tiuno commented 3 months ago

There's a measure that I want to do and I haven't managed to achieve this using Bechamel. The end goal is to measure the memory footprint of values using Obj.reachable_words.

Suppose that we want to compare the memory footprint of different implementations of semi-persisent lists.

Let's consider that we have a module type SP_LIST and different implementations: module A : SP_LIST and module B : SP_LIST.

Then we have a generic test signature with a make_list function that will allocate a value, let's call it v, that will be passed to the function bench, which will be the basis of the test. I would like to compare the foot print of v before and after the call of bench.

module type TEST = functor (Impl : SP_LIST) -> sig
  type t

  val make_list : unit -> t
  val bench : int -> t -> unit
end

My idea was to try to make a test with resources for modules A and B and then, use the alloc and free functions to register the values of interest in a table that is read by the measurement module and use Ob.reachable_words.

However, this doesn't work. When the test is set up with multiple allocations the alloc and free functions are called exactly once for each run of n executions of bench. So the measured values keep decreasing as n grows. This is because the footprint of execution n doesn't add up to the execution n+1 of bench.

I would like to make a feature request, that I'm willing to implement, to add such measurement features. Although I don't know which design would fit properly with Bechamel's mindset.

dinosaure commented 3 months ago

The basic idea of bechamel is to execute a given function N times and collect metrics during its N executions. Let's take the time metric and a function fn. If we execute fn 1 time, we'll record how long the function took to execute (say 10ms). This assumes that we have this metric before execution and after execution. All we need to do is perform a subtraction afterwards. We'll assume that if we execute our function twice, we'll expect to get 20ms. The idea of the benchmark could be described in code in this way:

let benchmark fn =
  let metrics = Array.make 3000 0.0 in
  let run = ref 1 in
  for idx = 0 to Array.length metrics - 1 do
    let _N = !run in
    let a = gettimeofday () in
    for _ = 0 to _N do
       fn ()
    done;
    let b = gettimeofday () in
    metrics.(idx) <- b -. a;
    run := !run + 1
  done

Bechamel then tries to get results that should plot a curve (between our N and our subtractions) corresponding to an affine function: f(N) = T * N + beta where T should be the time spent executing our function (in our example, 10ms).

This is the fundamental idea behind bechamel. There are one thing to note: The metric is information which necessarily comes from the "system" (time, number of words allocated by OCaml, number of cache-misses, etc.). So the metric doesn't come from an OCaml value for example (as we might want with Obj.reachable_word) but from what the system can tell us before and after our function is executed.

alloc and free were functions added (#25) after the fact because the function may need a resource (like a file-descriptor). The aim of these resources is not to participate in the metrics; on the contrary, the aim is to ensure that they have as little influence as possible on the metrics (to obtain 'fair' values). If we go back to our example code, in the case of having multiple resources, these are allocated before a metric value is obtained and afterwards.

--- a.ml    2024-03-26 17:27:12.085161748 +0100
+++ b.ml    2024-03-26 17:27:47.475366040 +0100
@@ -3,11 +3,13 @@
   let run = ref 1 in
   for idx = 0 to Array.length metrics - 1 do
     let _N = !run in
+    let resources = Array.make _N alloc in
     let a = gettimeofday () in
     for _ = 0 to _N do
        fn ()
     done;
     let b = gettimeofday () in
+    Array.iter free resources;
     metrics.(idx) <- b -. a;
     run := !run + 1
   done

From what I understand you want to have a mix and match between the alloc/free API and the metrics. Your example is not as trivial as all that. With bechamel, you can see whether a function allocates (with minor_allocated and/or major_allocated) but it's difficult to know how a value changes depending on how it's used (whether it grows or not). If your value is a global (like a system resource), however, and you'd like to know how it evolves after calling fn, there is a way, but I'd really like you to reconsider what seems relevant as a metric first :+1: .

tiuno commented 3 months ago

@dinosaure thank you for your explanation of how Bechamel takes measures. I do understand your point about Bechamel being oriented to measure system resources rather than the dynamics of values.

Indeed, it is likely that the alloc/free API is not the right one. However, could we imagine a completely new kind of test of type unit -> 'a that return a value of interest on which we can take measurements instead of unit? I understand that this may lead to very high consumption of memory and possibly result in out of memory errors.

tiuno commented 3 months ago

To illustrate my idea, this is what I have in mind implemented in a module.

open Bechamel

type register = { mutable total : int; mutable clear : bool }

let register = { total = 0; clear = true }

module type TEST = sig
  type t

  val make_list : unit -> t
  val bench : int -> t -> unit
end

module type WORDS_TEST = functor (_ : TEST) -> sig
  include TEST
end

module MakeWordsCase : WORDS_TEST =
functor
  (Test : TEST)
  ->
  struct
    include Test

    let bench n l =
      Sys.opaque_identity
        (let before = Obj.(repr l |> reachable_words) in
         bench n l;
         register.total <-
           register.total + (Obj.(repr l |> reachable_words) - before))
  end

module Words : S.MEASURE = struct
  type witness = unit

  let load () = ()
  let unload () = ()
  let make () = ()

  let get () =
    if register.clear then (
      register.clear <- false;
      0.)
    else (
      register.clear <- true;
      let m = register.total in
      register.total <- 0;
      m |> float_of_int)

  let label () = "reachable-words"
  let unit () = "words"
end

module Extension = struct
  type 'w t = 'w Measure.measure

  let words = Measure.register (module Words)
end

module Instance = struct
  let words = Measure.instance (module Words) Extension.words
end

This works as expected. The drawback of this approach is that I have to take reachable words measurements separately as the bench function of the MakeWordsCase has some overhead.