Apply minor edits, creating a PR to add comments on 1_1crossfit.R.
You don't need to merge this branch -- this is mostly for providing feedback; probably you don't want to merge because functions are not properly exported
The followings are my comments:
Control visibility of functions via roxygen tags, instead of manually editing NAMESPACE. Please add #' @export if you want to export a function (you might need this for debugging).
Computing quantities for the control and treatment group share most of the code: you probably want to make a function so that same logic is not repeatedly show up in the code; it also makes it harder to maintain as you need to edit both when edits are needed.
In general creating intermediate variables such as X_train <- data_train$X can help improve the readability of the code. There are many call of data_train$variable in the code, which can be simplified by calling an intermediate object.
I'm not sure what TryCatch blocks are trying to achieve. I also see TryCatch is called multiple times. When an input check can prevent errors, you should probably prefer to use the input check over TryCatch, we it makes it more clear what kind of errors you're expecting. If I understand correctly, in the following case, lprobust can fail when eval.dat0.all is empty, for example:
if (length(eval.dat0.all) == 0) {
mu.fit0 <- 0
} else {
# actually fit the local linear regression
}
I recommend creating more auxiliary functions to improve the readability of the logic. For example, whenever you need to slice observations in the bound
data %>%
filter(X >= c.vec[g], X < c.vec[min(g + 1, q)])
Probably, it's better to write a function is_in_range and call it instead.
I would think saving outputs to data_test for each k and merging them at the end is much simpler than saving outputs to data_all with fold_id == k slicing. Currently inputs to lprobust are taken from data_train and data_test and outputs are directly saved to data_all; I think this makes modularizing some part somewhat difficult -- ideally you want to call a function like estimate_mu_k(data_train, data_test, fold_id = k) in the loop and merge the output.
It's a minor point but I'm not sure if some calls to paste0 is necessary. For example, paste0("mu", ".aug") is equivalent to "mu.aug"?
Apply minor edits, creating a PR to add comments on
1_1crossfit.R
.The followings are my comments:
#' @export
if you want to export a function (you might need this for debugging).X_train <- data_train$X
can help improve the readability of the code. There are many call ofdata_train$variable
in the code, which can be simplified by calling an intermediate object.I'm not sure what TryCatch blocks are trying to achieve. I also see TryCatch is called multiple times. When an input check can prevent errors, you should probably prefer to use the input check over TryCatch, we it makes it more clear what kind of errors you're expecting. If I understand correctly, in the following case,
lprobust
can fail wheneval.dat0.all
is empty, for example:Then, instead, you could do something like
Probably, it's better to write a function
is_in_range
and call it instead.I would think saving outputs to
data_test
for each k and merging them at the end is much simpler than saving outputs todata_all
withfold_id == k
slicing. Currently inputs tolprobust
are taken fromdata_train
anddata_test
and outputs are directly saved todata_all
; I think this makes modularizing some part somewhat difficult -- ideally you want to call a function likeestimate_mu_k(data_train, data_test, fold_id = k)
in the loop and merge the output.It's a minor point but I'm not sure if some calls to
paste0
is necessary. For example,paste0("mu", ".aug")
is equivalent to"mu.aug"
?