Closed paulgp closed 3 years ago
This looks like RDOptBW is trying a bandwidth that's too small, but I am not sure how it's happening since the smallest bandwidth that the function tries is chosen so the regression can always be run: https://github.com/kolesarm/RDHonest/blob/711d70fc00c350253a00f5f769989356841057f0/R/NPR_lp.R#L168-L175
Two questions:
traceback()
immediately after you get the error, what does it return?
── Variable type: character ────────────────────────────────────────────────────────────────────────────────────────────────────────
skim_variable n_missing complete_rate min max empty n_unique whitespace
1 state_name 0 1 6 6 0 1 0
2 state_abbrev 0 1 2 2 0 1 0
3 variable 0 1 10 10 0 1 0
── Variable type: numeric ────────────────────────────────────────────────────────────────────────────────────────────────────────── skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist 1 region 0 1 4 0 4 4 4 4 4 ▁▁▇▁▁ 2 year 0 1 2014. 2.75 2008 2012 2014 2016 2018 ▃▅▇▇▆ 3 age 0 1 60.4 7.11 51 55 60 64 79 ▇▇▃▂▂ 4 new_race 0 1 2 0 2 2 2 2 2 ▁▁▇▁▁ 5 outcome 0 1 0.845 0.362 0 1 1 1 1 ▂▁▁▁▇ 6 pop 0 1 373. 376. 13.7 131. 265. 475. 2553. ▇▂▁▁▁
2. Here's an example:
RDHonest(outcome ~ age,
data = test,
kern = "uniform",
weight=pop,
opt.criterion = "MSE",
M = B_reg,
cutoff = 65)
Error in if (h["m"] <= 0) 0 * d$Xm else K(d$Xm/h["m"]) :
missing value where TRUE/FALSE needed
traceback() 4: NPRreg.fit(d, h1, se.method = se.initial) 3: NPRPrelimVar.fit(d, se.initial = se.initial) 2: NPRHonest.fit(d, M, kern, opt.criterion = opt.criterion, bw.equal = bw.equal, alpha = alpha, beta = beta, se.method = se.method, J = J, sclass = sclass, order = order, se.initial = se.initial) 1: RDHonest(outcome ~ age, data = test, kern = "uniform", weight = pop, opt.criterion = "MSE", M = B_reg, cutoff = 65)
Sorry for the late reply. It looks like the issue is that in computing a preliminary variance estimate, the package uses the IK bandwidth. In this case, the IK bandwidth returns (I am guessing) a NaN
---perhaps because there is no variation in the outcome in one of the preliminary estimation windows, or because there is no data in there.
In commit 6de8ae1, I have changed the code so that the bandwidth is reset to Inf
in such cases. In those cases, one can use "nn" for se.initial, rather than "EHW", I think that should be more stable in these situations. Does it resolve the issue?
Running into the following error:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases
This is showing up in subsets of RDHonest estimation where the data is relatively sparse, and the bandwidth is being selected using RDOptBW. If I specify the command with a bandwidth, there are no errors.
rdh.all <- RDHonest(outcome ~ age, data = reg_data, kern = "uniform", weight=pop, opt.criterion = "MSE", M = B_reg, cutoff = 65)
(I assume this is an issue with cross-validation, but not sure.