Control NAMESPCE by roxygen tags, instead of manually expose functions.

Apply minor edits, creating a PR to add comments on 1_1crossfit.R.

You don't need to merge this branch -- this is mostly for providing feedback; probably you don't want to merge because functions are not properly exported

The followings are my comments:

Control visibility of functions via roxygen tags, instead of manually editing NAMESPACE. Please add #' @export if you want to export a function (you might need this for debugging).
Computing quantities for the control and treatment group share most of the code: you probably want to make a function so that same logic is not repeatedly show up in the code; it also makes it harder to maintain as you need to edit both when edits are needed.
In general creating intermediate variables such as X_train <- data_train$X can help improve the readability of the code. There are many call of data_train$variable in the code, which can be simplified by calling an intermediate object.

I'm not sure what TryCatch blocks are trying to achieve. I also see TryCatch is called multiple times. When an input check can prevent errors, you should probably prefer to use the input check over TryCatch, we it makes it more clear what kind of errors you're expecting. If I understand correctly, in the following case, lprobust can fail when eval.dat0.all is empty, for example:

eval.dat0.all <- c(eval.dat0.m, eval.dat0.aug, eval.dat0.pseudo)

# local linear regression
tryCatch({
 mu.fit0 <- lprobust(data_train$Y[data_train$D == 0 & data_train$G == g],
                      data_train$X[data_train$D == 0 & data_train$G == g],
                      eval = eval.dat0.all,
                      bwselect = "imse-dpi")$Estimate[, 5]
}, error = function(e) return(0) )

Then, instead, you could do something like

if (length(eval.dat0.all) == 0) {
   mu.fit0 <- 0
} else {
  # actually fit the local linear regression
}

I recommend creating more auxiliary functions to improve the readability of the logic. For example, whenever you need to slice observations in the bound

data %>% 
  filter(X >= c.vec[g], X < c.vec[min(g + 1, q)])

Probably, it's better to write a function is_in_range and call it instead.

is_in_range <- function(X, cutoff_lb, cutoff_ub) {
  return(X >= cutoff_lb & X <= cutoff_ub)
}

data[is_in_range(X, c.vec[g], c.vec[min(g+1, q)], ]

I would think saving outputs to data_test for each k and merging them at the end is much simpler than saving outputs to data_all with fold_id == k slicing. Currently inputs to lprobust are taken from data_train and data_test and outputs are directly saved to data_all; I think this makes modularizing some part somewhat difficult -- ideally you want to call a function like estimate_mu_k(data_train, data_test, fold_id = k) in the loop and merge the output.
It's a minor point but I'm not sure if some calls to paste0 is necessary. For example, paste0("mu", ".aug") is equivalent to "mu.aug"?

kkawato / rdlearn

Control NAMESPCE by roxygen tags, instead of manually expose functions. #2