Add statistically-significant improvement reporting

This is a bit of a nuanced issue. Currently benchmarks don't have any statistics outside of min/max/median/mean time. But I would very much like to do proper statistical analysis across benchmark runs to determine if a difference is distinguishable from random (i.e. statistically-significant).

The way the Stabilizer folks went about it resulted in a normal distribution of results. But being an easy-to-pickup userspace program, Divan doesn't have the same luxury of being an LLVM plugin. That said, benchmarks tend to follow a log-normal distribution. So perhaps we can make the same stats work from that?

As per the intent of the issue, my plan once #10/#42 is complete is to report a ±% change from the previous run given information previously recorded in target/divan.

nvzqz / divan

Add statistically-significant improvement reporting #48