tov / dssl2

A data structures student language, version 2
MIT License
9 stars 5 forks source link

Add benchmarking library. #16

Closed stamourv closed 4 years ago

stamourv commented 4 years ago

With the goal of doing some live benchmarking. Current plan: compare dictionary implementations in the dictionaries lecture.

The change to the printer is to allow printing plots to the REPL.

stamourv commented 4 years ago

Have you had a chance to take a look? I'd like to use this this coming quarter.

tov commented 4 years ago

I will merge this, but I'd like to spend a couple hours playing with it first.

stamourv commented 4 years ago

Sure. I'll email you the file I'm planning to use.

tov commented 4 years ago

Example: benchmarks.rkt.txt

stamourv commented 4 years ago

Nag.

tov commented 4 years ago

I spent a few hours on this and pushed something I mostly like. You use it like this:

#lang dssl2

import benchmark

##
## Functions to measure
##

def fib_rec(n):
    if n < 2:
        return n
    else:
        return fib_rec(n - 1) + fib_rec(n - 2)

def fib_iter(n):
    def loop(i, a, b):
        if i <= 0:
            return a
        else:
            return loop(i - 1, b, a + b)
    return loop(n, 0, 1)

# Plot both functions applied to 0, 2, 4, ..., 28.
plot_benchmarks('Fibonacci',
    [fib_iter, fib_rec],
    [2 * i for i in range(15)])

# The scales are too different! Let's see what fib_rec looks
# like on a log scale:
log_plot_benchmarks('Fibonacci',
    [fib_rec],
    [2 * i for i in range(15)])

# We can plot the fast version on larger inputs:
plot_benchmarks('Fibonacci',
    [fib_iter],
    [1000 * i for i in range(30)])

The result looks like this:

some benchmark plots

It prints some progress information since it can be hard to tell what’s going on when the benchmarks take awhile.

Does this work for you?

stamourv commented 4 years ago

Your interface is indeed nicer, and the progress info is a nice addition, but unfortunately it doesn't work for my use case.

If you look at the example I sent you, the pattern is: run some code, measure, run some code, measure, etc. As far as I can tell, your interface only supports: measure, measure, measure, etc. I've found the former pattern useful for being able to get measurements with high n without measuring all the time (and so in a reasonable amount of time).

I've spent quite a bit of time thinking about designs for benchmarking libraries (see the racket benchmark lib), and the tradeoffs between interface simplicity and flexibility are definitely one of the hard parts (which I won't claim to have solved). Giving the client control of the loop (and providing a sampling function) is not a pretty as an all-encompassing benchmark function, but it's a lot more flexible.

Since the quarter is getting really close, we need a resolution ASAP. Ideally today. I'm already telling PMs and students to hold off installing DSSL2 for now. Here's what I'm proposing:

stamourv commented 4 years ago

Alternatively, if you just merge the changes to the printer, I should be able to do the rest in 3rd-party code. But again, I'm really hoping we can resolve this ASAP.

tov commented 4 years ago

I’m not sure if I understand the workflow distinction you’re making. Is it that you want the API to separate measurement from plotting? If so, it does that, though I didn’t show you how.

In particular, it’s factored into separate libraries for plotting and benchmarking. The plotting library (src) is in Racket in order to depend on Racket’s plotting API. The benchmarking library (src) built on top of the plotting library, in DSSL2. To separate timing from plotting, use benchmark_functions to do the actual timing, getting back a representation of the timings. Then you can pass that to plot whenever:

let fib_timings = benchmark_functions([fib_iter, fib_rec], [5 * i for i in range(6)])

# …

plot('Fibonacci', fib_timings, 'n', 'time (ms)')

Does that help?

stamourv commented 4 years ago

I don't think that helps.

The issue is that, as far as I can tell, the measurement loop is entirely controlled by the library. The client passes in a function to measure, and the library calls it n times on different inputs. The example I sent you runs some code in between each measurement (to do more insertions). It's possible because, with the initial interface, the client controls the loop, and can run that code in between taking measurements. Separating measurements from plotting, while nice, doesn't help with that.

Maybe there's something else in your design that would make that possible, but I haven't seen it. Why don't you try rewriting my example with your interface? That's what I tried this morning, and I couldn't.

In the meantime, could you push the changes to the printer? Those should be fine regardless of the top-level interface, and as I said, worst case I can get the benchmarking part working in 3rd party code. Not so with the printing, which requires changes to the DSSL2 internals.

tov commented 4 years ago

Oh, I see what you’re doing now. Here your benchmark function using my API:

import plot

# …

def benchmark(d, n_iterations, n_insertions_per_iteration):
    let insert_data = [None; n_iterations]
    let lookup_data = [None; n_iterations]
    for i in n_iterations:
        print("running iteration %p of %p\n", i+1, n_iterations)
        for j in n_insertions_per_iteration - 1:
            d.put(d.len(), True)
        def do_insert(): d.put(d.len(), True)
        def do_lookup(): d.get(0)
        insert_data[i] = [i, (time: do_insert()).cpu]
        lookup_data[i] = [i, (time: do_lookup()).cpu]
    let title = '%s Benchmarks'.format(d.__class__)
    let data = [['insert', insert_data], ['lookup', lookup_data]]
    return plot(title, data, 'iters', 'time (ms)')

This works because the time: form lets you time anything you like, and then the plot function from import plot lets you plot whatever data you like.

The functions in the benchmark library are very thin wrappers around time: and plot, and you can combine them other ways if you like.

Also, for what it’s worth, the library doesn’t call the function N times—N is just a parameter that it passes to the function being measured. For example:

#lang dssl2

import benchmark

def f(n):
    println('\nn = %p\n', n)

# Calls `f` four times:
plot_benchmarks('f', [f], [10, 100, 100, 1000000])    

The output looks like this:

(Yes, I will push the printer changes.)

tov commented 4 years ago

By push you also mean release with a new version number? Or just get it on master?

stamourv commented 4 years ago

I see, so just skipping the benchmark library and going straight for plot. I just tried it and it works.

I'm good with this version. Please release all of this whenever you can. Other PR is good too. New version number would probably good, but I forget what that entails.

Also: you may want to check the dependencies. I think the new ones (plot, snip?) may not be listed, and I got warnings when I built.

tov commented 4 years ago

Okay, will do!

I noticed a bug (I think) in your vector dictionary’s put method. In particular, it assumes that the key being inserted is new. This is true for your benchmarks, I suppose, but probably not true in general. Here’s my fix (which also gets rid of some variables):

    def put(self, k, v):
        if self.len() == 0:
            self._contents = [assoc(k, v)]
            return
        let index = self.binary_search(k)
        if index:
            self._contents[index].value = v
            return
        let new_length = self.len() + 1
        let old_contents = self._contents
        self._contents = [None; new_length]
        let offset = 0
        for i in range(new_length):
            if i + 1 == new_length or \
                    (offset == 0 and old_contents[i].key > k):
                self._contents[i] = assoc(k, v)
                offset = 1
            else:
                self._contents[i] = old_contents[i - offset]
tov commented 4 years ago

Oh, also, your binary search indexes out of bounds when given an empty vector. Your put method avoids this by special-casing the empty vector, but you can fix the binary search by checking for the not-found case first:

    def binary_search(self, key) -> OrC(nat?, False):
        let vec = self._contents
        def bin(lo, hi):
            if lo > hi:
                return False
            let mid = (lo + hi) / 2
            if vec[mid].key == key:
                return mid
            elif vec[mid].key > key:
                bin(lo, mid - 1)
            else:
                bin(mid + 1, hi)
        bin(0, vec.len() - 1)

Do you find this closed-interval binary search clearer than the closed-open–interval one I use?:

    def binary_search(self, key) -> OrC(nat?, False):
        let vec = self._contents
        def bin(lo, hi):
            if lo >= hi:
                return False
            let mid = (lo + hi) / 2
            if vec[mid].key == key:
                return mid
            elif vec[mid].key > key:
                bin(lo, mid)
            else:
                bin(mid + 1, hi)
        bin(0, vec.len())

(Have we talked about this before?)

tov commented 4 years ago

By the way, the reason I was playing with that was that I wanted to see whether it was possible to write the vector dictionary put method in a more symmetrical way using a comprehension. And indeed it is!:

    def put(self, k, v):
        let index = self.binary_search(k)
        if index:
            self._contents[index].value = v
            return
        def old(i):
            if i < 0:            assoc(-inf, None)
            elif i < len(self):  self._contents[i]
            else:                assoc(inf, None)
        def new(i):
            let prev = old(i - 1)
            let curr = old(i)
            if k > curr.key:     curr
            elif k > prev.key:   assoc(k, v)
            else:                prev
        self._contents = [ new(i) for i in range(len(self) + 1) ]
tov commented 4 years ago

Also: you may want to check the dependencies. I think the new ones (plot, snip?) may not be listed, and I got warnings when I built.

Huh, I can’t seem to reproduce, but I added "plot" and "snip" to info.rkt anyway, and it doesn’t seem to hurt.