Open dapperdrop opened 4 years ago
Oh yes! That would be awesome.
As you allude to, priors could be generated through our sizing process. Anything is better than no prior. Also, I'm not sure how relevant older data is, considering seasonality... perhaps the 30 days we use in our sizing calculator is sufficient here too?
I don't think we need to size experiments in advance with bayesian inference, but we'd still need it for establishing the prior I think.
Anything is better than no prior.
There is no such thing as Bayesian with “no prior”.
At a bare minimum there is an “uninformed prior” (which is a bit of a misnomer imho), but you can’t take the prior out of the equation (or out of the philosophy, for that matter).
Another question is how to we deal with sizing
Could you explain what you mean by "sizing"? I am not familiar with this term.
We haven't documented this, but before running experiments at Mint Metrics, we calculate the traffic we need for a minimum detectable effect using some helper functions. e.g.:
> estimateDurationQuery(
+ app_id = "site_name",
+ trigger_clause = "page_urlpath like '/products/%'",
+ conversion_clause = "page_urlpath = '/order/thank-you/'",
+ delta = -0.07,
+ recipes = 2,
+ stat_power = 0.8
+ )
[1] "Days to run: 31.786299299664"
subjects conversions base_cvr target_cvr
1 47862 5221 0.1090845 0.1014485
It gives us a base line conversion rate for users who would typically be exposed over the last 30 days. Perhaps this baseline conversion rate will be useful as a prior?
Kind of similar to this calculator: https://www.evanmiller.org/ab-testing/sample-size.html
The actual calculation is performed here in our code: https://github.com/mint-metrics/mojito-r-analytics/blob/master/mojito-functions/experiment_sizing.R#L40
stat_power = 0.8
I assume this refers to the "desired statistical power"? In a Frequentist paradigm, power is needed to control for the type-II error (false negative) rate. Conversely, in a Bayesian paradigm, there are (afaik) no type-II error rate guarantees.
I don't think we need to size experiments in advance with bayesian inference
Indeed there is no need. Sizing (or power) is needed to make guarantees about error rates that Bayesian inference does not consider.
(ftr: imho this is a limitation of the Bayesian approach, not a strength.)
@lukasvermeer @kingo55
Not sure if my thinking is correct, but could we take an approach that leverage the advantages of both paradigms?
I.e. Frequentist to determine a target sample size / test duration to reduce type-II errors and Bayesian (with strong priors) to reduce type-I errors and easier to disseminate results?
I've seen some other CRO agencies use 'hybrid' approaches, albeit not as simplistic as this, so this train of thought maybe completely off.
Bayesian (with strong priors) to reduce type-I errors
While a Bayesian approach might empirically reduce type-I errors (when evaluated against some simulated data using a Frequentist lens), there are not guarantees about error rates (type-I or type-II).
I really have no idea how one would get the best of both worlds.
@kingo55 should we consider fleshing out Bayesian analytics again? It would be interesting to develop some functionality to run side-by-side with the Frequentist reports we run, to see how it stacks up.
The main thing to put some thought to is how we calculate priors. We could perhaps calculate it (mean + std deviation) based on the past X months worth of conversion data?
Another question is how to we deal with sizing and presenting the data in our reports.
Refs: