As pointed out by the statement of the question $\mathbb{P}(p \in C_n) \geq 1- \alpha = 0.95$, for $\alpha=0.05$. According to this, the Coverage should always be above 0.95, which is not what we see in the simulation provided here.
Sincerely, I do not understand the simulation provided since I am not familiar with "tqdm" and maybe other things. If it helps, what I did was to generate 100 Bernoulli(p) RV of length N=10.000, calculate the 100 cumulative RVs (of length N=10000) and average the coverage over the 100 repetitions. My caveats: I am not sure I really need to average over a given number of random vectors (which I arbitrarily decided to be 100)
Here mycode:
import math
import random as rd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import bernoulli
np.random.seed(1)
N = 10000
p = 0.4
alpha=0.05
nn=np.arange(1,N+1)
XX_Cum=[np.cumsum(bernoulli.rvs(p,size=N))/nn for i in range(100)]
As pointed out by the statement of the question $\mathbb{P}(p \in C_n) \geq 1- \alpha = 0.95$, for $\alpha=0.05$. According to this, the Coverage should always be above 0.95, which is not what we see in the simulation provided here.
Sincerely, I do not understand the simulation provided since I am not familiar with "tqdm" and maybe other things. If it helps, what I did was to generate 100 Bernoulli(p) RV of length N=10.000, calculate the 100 cumulative RVs (of length N=10000) and average the coverage over the 100 repetitions. My caveats: I am not sure I really need to average over a given number of random vectors (which I arbitrarily decided to be 100)
Here mycode:
import math import random as rd import matplotlib.pyplot as plt import numpy as np from scipy.stats import bernoulli np.random.seed(1)
N = 10000 p = 0.4 alpha=0.05 nn=np.arange(1,N+1) XX_Cum=[np.cumsum(bernoulli.rvs(p,size=N))/nn for i in range(100)]
def epsilon(N,alpha): return np.sqrt(1/(2N)np.log(2/alpha))
Counter=np.array([[np.abs(XX_Cum[j][i-1]-p)<=epsilon(i,alpha) for i in range(1,N+1)] for j in range(100)])
Coverage=[np.count_nonzero(Counter.transpose()[i])/100 for i in range(len(Counter.transpose()))]
plt.plot(nn, Coverage,label='Coverage') plt.plot(nn, [1-alpha for i in range(len(Counter.transpose()))],label='Hoeffding s lower bound')
naming the x axis
plt.xlabel('N')
naming the y axis
plt.ylabel('$P(p \in C_n)$')
giving a title to my graph
plt.legend()
function to show the plot
plt.show()