telmo-correa / all-of-statistics

Self-study on Larry Wasserman's "All of Statistics"
974 stars 278 forks source link

Issues with Exercise 5.6.4 b) (Chapter 5: Inequalities) #18

Open vashcast opened 1 year ago

vashcast commented 1 year ago

As pointed out by the statement of the question $\mathbb{P}(p \in C_n) \geq 1- \alpha = 0.95$, for $\alpha=0.05$. According to this, the Coverage should always be above 0.95, which is not what we see in the simulation provided here.

Sincerely, I do not understand the simulation provided since I am not familiar with "tqdm" and maybe other things. If it helps, what I did was to generate 100 Bernoulli(p) RV of length N=10.000, calculate the 100 cumulative RVs (of length N=10000) and average the coverage over the 100 repetitions. My caveats: I am not sure I really need to average over a given number of random vectors (which I arbitrarily decided to be 100)

Here mycode:

import math import random as rd import matplotlib.pyplot as plt import numpy as np from scipy.stats import bernoulli np.random.seed(1)

N = 10000 p = 0.4 alpha=0.05 nn=np.arange(1,N+1) XX_Cum=[np.cumsum(bernoulli.rvs(p,size=N))/nn for i in range(100)]

def epsilon(N,alpha): return np.sqrt(1/(2N)np.log(2/alpha))

Counter=np.array([[np.abs(XX_Cum[j][i-1]-p)<=epsilon(i,alpha) for i in range(1,N+1)] for j in range(100)])

Coverage=[np.count_nonzero(Counter.transpose()[i])/100 for i in range(len(Counter.transpose()))]

plt.plot(nn, Coverage,label='Coverage') plt.plot(nn, [1-alpha for i in range(len(Counter.transpose()))],label='Hoeffding s lower bound')

naming the x axis

plt.xlabel('N')

naming the y axis

plt.ylabel('$P(p \in C_n)$')

giving a title to my graph

plt.legend()

function to show the plot

plt.show()

image