Bayesian vs Frequentist

I spent three weeks reading about this topic. It's funny how this resulting note is so short and obvious. 🤷‍♀️

Philosophy

Let's say we want to estimate some model parameter H (H for Hypothesis), given some observed data D.

Frequentist:
- probability = measure of frequency of events from repeatable experiments
- H is fixed, although unknown; data is random
- hence P(H) or P(H|D) does not make sense; best estimate for H is the one maximizing likelihood: P(D|H), aka MLE
- Interpret P(D|H): if we draw data multiple times from the distribution parameterized by H, then the probability of the drawn data matching our observed data
Bayesian:
- probability is extended to measure degree of confidence/certainty about values
- data is fixed, H is random
- aims to give the whole posterior distribution P(H|D); can use MAP (maximizing posterior) as a point estimate to be comparable with MLE

In practice

both methods will likely yield same estimate for simple problems. But results can differ for higher-dimension problems, especially those involving nuisance parameters.
For higher dimension problems, both would resort to numerical methods instead of analytical answers:
- frequentists: optimization techniques like gradient descent to maximize the likelihood
- bayesian: sampling techniques like MCMC to find the posterior distribution

Confidence Interval vs Credit Region

Both are to provide bounds for the parameter estimate, but mean differently.

First some terminology:

Credible region = shortest interval under posterior distribution that contains 95% of probability

Interpretation:

Bayesian: 95% of possible Hs (possible = H generated from prior that generates data set that matches with observed data) will fall within CR.
- to simulate:
  - sample H from prior
  - for each H, generates a data set from likelihood distribution
  - select data sets that match the observed data set
  - for the Hs that generates the matching data set, find proportion of H that falls within computed CR
frequentist: 95% probability that when I compute CI of data drawn from this distribution, CI will contain true value (true value is fixed)
- to simulate:
  - draw sets of data from likelihood distribution defined by single true H
  - compute CI for each set
  - find portion of CIs that contain true H
- Given one set of observed data, it only says that true value may or may not be within CI.

Main arguments for Bayesian

One simple axiom (Bayes' theorem) rules all, freedom to substitute arbitrary complicated models.
Used to be computationally intensive, but no longer a problem.
Bayesian gives a whole posterior distribution, can answer multiple questions at once.
CI is simply useless for given data.

Note

Frequentists are NOT wrong. They still make sense in situations where multiple data realizations are possible (eg. gambling). In most situations where we are concerned about one set of observed data, CI and p-values often answer the wrong question.

References

Great blog series on Frequentist vs Bayesian
- I strongly recommend reading first 3 posts

xysun / blog