Open clerian opened 8 years ago
@clerian
Short answer: the code is correct.
>>> hdi_of_mcmc(st.norm(0,1).rvs(1_000_000))
(-1.9924611632940947, 1.9300090335912563)
>>> import scipy.stats as st
>>> st.norm(0,1).ppf([0.025, 0.975])
array([-1.95996398, 1.95996398])
>>> st.norm(0,1).ppf([0.05, 0.95])
array([-1.64485363, 1.64485363])
An annotated version of the function (comments mine):
def hdi_of_mcmc( sample_vec, cred_mass = 0.95 ):
# sort the samples
sorted_pts = np.sort( sample_vec )
# the credible interval (CrI) contains ci_idx_inc samples
ci_idx_inc = int(np.floor( cred_mass*len(sorted_pts) ))
# the CrI excludes n_cis samples
n_cis = len(sorted_pts) - ci_idx_inc
# ci_width contains the widths of all candidate CrIs,
# i.e. the distance between the i-th and (95%+i)-th sample
ci_width = sorted_pts[ci_idx_inc:] - sorted_pts[:n_cis]
# The shortest interval in ci_width is the one with highest density,
# and its indexed by min_idx...
min_idx = np.argmin(ci_width)
# ...which means the HDI starts at hdi_min...
hdi_min = sorted_pts[min_idx]
# ...and lasts until ci_idx_inc samples later.
hdi_max = sorted_pts[min_idx+ci_idx_inc]
return hdi_min, hdi_max
Hello,
in the function hdi_of_mcmc, could it be that the standard configuration actually calculates a slightly off HDI?
let's assume we want to calculate the 0.95 interval, then the two lines would subtract the remaining 5% two times and thus calculating the 0.9 interval?
or am I mistaken somewhere?
Best regards, Clemens Sauerzopf