I spent three weeks reading about this topic. It's funny how this resulting note is so short and obvious. 🤷♀️
Philosophy
Let's say we want to estimate some model parameter H (H for Hypothesis), given some observed data D.
Frequentist:
probability = measure of frequency of events from repeatable experiments
H is fixed, although unknown; data is random
hence P(H) or P(H|D) does not make sense; best estimate for H is the one maximizing likelihood: P(D|H), aka MLE
Interpret P(D|H): if we draw data multiple times from the distribution parameterized by H, then the probability of the drawn data matching our observed data
Bayesian:
probability is extended to measure degree of confidence/certainty about values
data is fixed, H is random
aims to give the whole posterior distribution P(H|D); can use MAP (maximizing posterior) as a point estimate to be comparable with MLE
In practice
both methods will likely yield same estimate for simple problems. But results can differ for higher-dimension problems, especially those involving nuisance parameters.
For higher dimension problems, both would resort to numerical methods instead of analytical answers:
frequentists: optimization techniques like gradient descent to maximize the likelihood
bayesian: sampling techniques like MCMC to find the posterior distribution
Confidence Interval vs Credit Region
Both are to provide bounds for the parameter estimate, but mean differently.
First some terminology:
Credible region = shortest interval under posterior distribution that contains 95% of probability
Interpretation:
Bayesian: 95% of possible Hs (possible = H generated from prior that generates data set that matches with observed data) will fall within CR.
to simulate:
sample H from prior
for each H, generates a data set from likelihood distribution
select data sets that match the observed data set
for the Hs that generates the matching data set, find proportion of H that falls within computed CR
frequentist: 95% probability that when I compute CI of data drawn from this distribution, CI will contain true value (true value is fixed)
to simulate:
draw sets of data from likelihood distribution defined by single true H
compute CI for each set
find portion of CIs that contain true H
Given one set of observed data, it only says that true value may or may not be within CI.
Main arguments for Bayesian
One simple axiom (Bayes' theorem) rules all, freedom to substitute arbitrary complicated models.
Used to be computationally intensive, but no longer a problem.
Bayesian gives a whole posterior distribution, can answer multiple questions at once.
CI is simply useless for given data.
Note
Frequentists are NOT wrong. They still make sense in situations where multiple data realizations are possible (eg. gambling). In most situations where we are concerned about one set of observed data, CI and p-values often answer the wrong question.
Bayesian vs Frequentist
I spent three weeks reading about this topic. It's funny how this resulting note is so short and obvious. 🤷♀️
Philosophy
Let's say we want to estimate some model parameter
H
(H for Hypothesis), given some observed dataD
.Frequentist:
H
is fixed, although unknown; data is randomP(H)
orP(H|D)
does not make sense; best estimate forH
is the one maximizing likelihood:P(D|H)
, aka MLEP(D|H)
: if we draw data multiple times from the distribution parameterized byH
, then the probability of the drawn data matching our observed dataBayesian:
H
is randomP(H|D)
; can use MAP (maximizing posterior) as a point estimate to be comparable with MLEIn practice
Confidence Interval vs Credit Region
Both are to provide bounds for the parameter estimate, but mean differently.
First some terminology:
Interpretation:
Bayesian: 95% of possible
H
s (possible =H
generated from prior that generates data set that matches with observed data) will fall within CR.H
from priorH
, generates a data set from likelihood distributionH
s that generates the matching data set, find proportion ofH
that falls within computed CRfrequentist: 95% probability that when I compute CI of data drawn from this distribution, CI will contain true value (true value is fixed)
H
H
Main arguments for Bayesian
Note
Frequentists are NOT wrong. They still make sense in situations where multiple data realizations are possible (eg. gambling). In most situations where we are concerned about one set of observed data, CI and p-values often answer the wrong question.
References