Bayesian analytics - Githubissues

mint-metrics / mojito-r-analytics

Reporting & analytics tools for the Mojito split testing framework

https://mojito.mx/docs/r-analytics-intro

BSD 3-Clause "New" or "Revised" License

10 stars 2 forks source link

Bayesian analytics #3

Open dapperdrop opened 4 years ago

dapperdrop commented 4 years ago

@kingo55 should we consider fleshing out Bayesian analytics again? It would be interesting to develop some functionality to run side-by-side with the Frequentist reports we run, to see how it stacks up.

The main thing to put some thought to is how we calculate priors. We could perhaps calculate it (mean + std deviation) based on the past X months worth of conversion data?

Another question is how to we deal with sizing and presenting the data in our reports.

Refs:

kingo55 commented 4 years ago

Oh yes! That would be awesome.

As you allude to, priors could be generated through our sizing process. Anything is better than no prior. Also, I'm not sure how relevant older data is, considering seasonality... perhaps the 30 days we use in our sizing calculator is sufficient here too?

I don't think we need to size experiments in advance with bayesian inference, but we'd still need it for establishing the prior I think.

lukasvermeer commented 4 years ago

Anything is better than no prior.

There is no such thing as Bayesian with “no prior”.

At a bare minimum there is an “uninformed prior” (which is a bit of a misnomer imho), but you can’t take the prior out of the equation (or out of the philosophy, for that matter).

lukasvermeer commented 4 years ago

Another question is how to we deal with sizing

Could you explain what you mean by "sizing"? I am not familiar with this term.

kingo55 commented 4 years ago

We haven't documented this, but before running experiments at Mint Metrics, we calculate the traffic we need for a minimum detectable effect using some helper functions. e.g.:

> estimateDurationQuery(
+   app_id = "site_name",
+   trigger_clause = "page_urlpath like '/products/%'",
+   conversion_clause = "page_urlpath = '/order/thank-you/'",
+   delta = -0.07,
+   recipes = 2,
+   stat_power = 0.8
+ )
[1] "Days to run: 31.786299299664"
  subjects conversions  base_cvr target_cvr
1    47862        5221 0.1090845  0.1014485

Trigger clause: This selects users who would have been exposed
Conversion clause: This selects users who would have converted after being exposed

It gives us a base line conversion rate for users who would typically be exposed over the last 30 days. Perhaps this baseline conversion rate will be useful as a prior?

kingo55 commented 4 years ago

Kind of similar to this calculator: https://www.evanmiller.org/ab-testing/sample-size.html

The actual calculation is performed here in our code: https://github.com/mint-metrics/mojito-r-analytics/blob/master/mojito-functions/experiment_sizing.R#L40

lukasvermeer commented 4 years ago

stat_power = 0.8

I assume this refers to the "desired statistical power"? In a Frequentist paradigm, power is needed to control for the type-II error (false negative) rate. Conversely, in a Bayesian paradigm, there are (afaik) no type-II error rate guarantees.

I don't think we need to size experiments in advance with bayesian inference

Indeed there is no need. Sizing (or power) is needed to make guarantees about error rates that Bayesian inference does not consider.

(ftr: imho this is a limitation of the Bayesian approach, not a strength.)

dapperdrop commented 4 years ago

@lukasvermeer @kingo55

Not sure if my thinking is correct, but could we take an approach that leverage the advantages of both paradigms?

I.e. Frequentist to determine a target sample size / test duration to reduce type-II errors and Bayesian (with strong priors) to reduce type-I errors and easier to disseminate results?

I've seen some other CRO agencies use 'hybrid' approaches, albeit not as simplistic as this, so this train of thought maybe completely off.

lukasvermeer commented 4 years ago

Bayesian (with strong priors) to reduce type-I errors

While a Bayesian approach might empirically reduce type-I errors (when evaluated against some simulated data using a Frequentist lens), there are not guarantees about error rates (type-I or type-II).

I really have no idea how one would get the best of both worlds.