Closed nhejazi closed 6 years ago
Based on manual inspection, there's nothing obviously different about the code used for computing the parameter estimates and related inference between the two commits given above. The only seemingly noteworthy point of difference is a change in the argument eif_tol
from an arbitrarily small value (previously 1e-7) to a value relative to the sample size (1/length(Y)
). Unfortunately, re-running the simulation with a forced EIF tolerance of 1e-7 does not seem to resolve the issue:
n | est | var_mc | var_avg | bias | coverage |
---|---|---|---|---|---|
100 | 1.841118 | 0.6168012 | 0.0676804 | -0.1588817 | 0.282 |
10000 | 1.998961 | 0.0005663 | 0.0004627 | -0.0010387 | 0.917 |
n | est | var_mc | var_avg | bias | coverage |
---|---|---|---|---|---|
100 | 1.787103 | 1.936063 | 0.1261214 | -0.2128974 | 0.094 |
10000 | 1.994722 | 0.0006802 | 0.0008079 | -0.0052783 | 0.967 |
From this it seems to me that all simulation statistics associated with the larger sample size appear to indicate consistent/decent performance. Perhaps the estimator is unstable for smaller sample sizes?
Also @jeremyrcoyle notes that "...might be because you’re not doing the 'realistic regime' thing where you don’t shift values already on the edge -- i.e., you’re shifting the already high values out of the range where you have positivity, which would lead to variance issues." This could very well be causing the aberrant small sample behavior.
After running the same simulation over a selection of sample sizes, I think the results constitute convincing evidence that the problems in coverage are due to positivity violations. These are results produced from 1000 simulations over each sample size that appears below
n | est | var_mc | var_avg | bias | coverage |
---|---|---|---|---|---|
100 | 1.841118 | 0.6168012 | 0.0676804 | -0.1588817 | 0.282 |
500 | 1.953483 | 0.0377445 | 0.0105528 | -0.0465169 | 0.648 |
1000 | 1.970875 | 0.0121869 | 0.0049859 | -0.0291253 | 0.778 |
5000 | 1.995515 | 0.0012752 | 0.0009348 | -0.0044854 | 0.908 |
10000 | 1.998961 | 0.0005663 | 0.0004627 | -0.0010387 | 0.917 |
n | est | var_mc | var_avg | bias | coverage |
---|---|---|---|---|---|
100 | 1.787103 | 1.936063 | 0.1261214 | -0.2128974 | 0.094 |
500 | 1.919533 | 0.0938503 | 0.0202032 | -0.0804669 | 0.518 |
1000 | 1.942053 | 0.0266887 | 0.0092032 | -0.057947 | 0.675 |
5000 | 1.989203 | 0.0018198 | 0.001657 | -0.010797 | 0.938 |
10000 | 1.994722 | 0.0006802 | 0.0008079 | -0.0052783 | 0.967 |
Closing this as of 0f80eac. It's unclear if this is still a problem but we should re-open later on if we notice similar or related issues.
There's something horribly wrong with the parameter estimate -- the Monte Carlo variance is quite high in small samples and still bad even in large samples, affecting confidence interval coverage quite significantly:
The bug reported here was introduced at some point between ee46907 and 81175b1.