Open AndreasMadsen opened 6 years ago
Every time a request is successfully completed, autocannon increases the counter
variable by 1 in https://github.com/mcollina/autocannon/blob/master/lib/run.js#L210-L218.
Every second, the counter
value is sampled and reset to 0 in https://github.com/mcollina/autocannon/blob/master/lib/run.js#L111-L119.
The sampled value goes in an instance of https://www.npmjs.com/package/hdr-histogram-js, which provides all the calculations. This is probably not the best data structure for this, as we do not have a lot of samples to deal with.
The number of samples for the req/sec and throughput is equivalent to the number of seconds the benchmark has run.
The sampled value goes in an instance of https://www.npmjs.com/package/hdr-histogram-js, which provides all the calculations. This is probably not the best data structure for this, as we do not have a lot of samples to deal with.
If the purpose is constant memory, see: See Welford online algorithm.
The number of samples for the req/sec and throughput is equivalent to the number of seconds the benchmark has run.
Hmm, this is a bit odd. Especially if nothing happened in that second. I also don't understand how it doesn't contract the your first statement "Every time a request is successfully completed, autocannon increases the counter variable by 1".
Can you show me where the final mean and standard deviance values are calculated? Maybe I can backtrack it from there.
Is there a module you would recommend to calculate online variance, mean, min and max? It would not fix the sample issue.
Mean and standard deviance are calculated by the hdr histogram based on the recorded values (samples). As I am interested in the mean and stddev of the number of requests that happens in a second, I count the number of request in every given second and then use that value as my sample. If none happens in a given second, that’s a zero to me.
This modules is what I use to create the end results: https://github.com/thekemkid/hdr-histogram-percentiles-obj/blob/master/index.js
Is there a module you would recommend to calculate online variance, mean, min and max? It would not fix the sample issue.
I don't think it has been implemented in a module. But you can steel the one from clinic-doctor
: https://github.com/clinicjs/node-clinic-doctor/blob/master/analysis/guess-interval.js#L145
I really like that algorithm as it is quite numerically stable, constant memory and easy to implement. Theoretically, it is not as stable as a two-pass algorithm but for most purposes it is fine. Theoretically, it also uses more flops but in practice that is easily offset by its constant memory that makes it fit into most registers.
Mean and standard deviance are calculated by the hdr histogram based on the recorded values (samples). As I am interested in the mean and stddev of the number of requests that happens in a second, I count the number of request in every given second and then use that value as my sample. If none happens in a given second, that’s a zero to me.
I see. I thought long and hard about it, and ... it is okay. I threw me off a bit because you will artificially decrease the standard deviation when you decrease the sample resolution, which is opposite of most intuition. However, the standard error will compensate for that in the end, so it is fine.
You are breaking a bunch of independence assumptions because a new request can't be started before another has completed. Supposedly, you could fix that with some fancy Poisson-variant of a Gamma distribution, but I think that ends up creating just as many new assumptions that are likely to break as well.
I know about the broken assumptions. However I’m not sure how to better express those. Do you think it would be better to use percentiles instead of stddev? See https://github.com/mcollina/autocannon/issues/138.
I know about the broken assumptions. However, I’m not sure how to better express those.
I would think too hard about it, as I don't see any good solutions.
Do you think it would be better to use percentiles instead of stddev? See #138.
A standard deviation doesn't assume anything about the distribution, so it is not invalid in that sense. However, it is a pretty useless summary to present. Not because the data might not be normally distributed but because it says nothing on the variation/deviation of the mean on its own. At the very least, you need to also know the number of samples it was estimated from.
Think of the standard deviation as an intermediate value that is good to keep around and easy to do math with. It is excellent for that purpose but it is not a good final value to present. (Actually, the variance is easier to do math with but they are more or less the same thing).
In conclusion, I would definitely show the empirical 2.5% and 97.5% percentiles instead.
As for what distribution it actually is, I would need to understand how the latency is estimated to give an educated guess, but most likely it is a gamma distribution that is close to a normal distribution. But really, just plot the histogram. For the advanced user, there is also the Kolmogorov–Smirnov test.
For my NodeConf EU talk, I actually used gamma distributed data, as that is what typically exists in benchmarks. And you know what, everything works out okay :)
In any case, I would really recommend that you include the number of observations the mean and variance are estimated from. While autocannon may run for 30 seconds, it is not obvious it is also 30 observations. You could also have sampled every 100ms, in which case you would have 300 observations.
Seems like we should also make the number/frequency of observations configurable, defaulting to 1 per second.
Seems like we should also make the number/frequency of observations configurable, defaulting to 1 per second.
That would be great for statistical significance. Just remember to scale the output unit so it remains n/s
.
Hey, I was wondering: is this enhancement is still desired by the autocannon team?
Sure if you would like to work on it!
I made some progress on the enhancement, but I have a few questions about how you all want it to turn out, specifically:
Could you make suggestions? Pick what makes sense to you, we can check it during review.
My suggestions are:
Go for it!
To compute a confidence interval or determine if there is a statistical difference between two versions, at least three pieces of information is required:
The number of observations is not really clear from the output.
In a previous conversation @mcollina wrote:
It is not really clear to me how that data is aggregated. Or how it relates to 30 samples.