From @pcarbo A few small suggestions for improving the R package code as we move forward:
Makefiles are typically not included in R packages, so it might be confusing to have one. Instead, I put the instructions in the README---feel free to elaborate on them as needed.
For testing different implementations (e.g., R vs. Rcpp), I typically try to have one interface (i.e., function), then have an if-then statement for algorithm.version. The idea is that these same implementations should reproduce the exact same behaviour. On the other hand, if you have multiple functions, there is no guarantee that the different functions will have the same behaviour.
When writing foo_rcpp, I typically implement an internal "wrapper" R function foo_call_rcpp (not exported to namespace) that does all the argument checking and forcing argument conversions before calling the C++ function. The idea is that you want the error handling to be in the R as much as possible to make it more user-friendly and easier to debug (and to make the code more concise and easier to read). For example, for scalar arguments x I will usually do as.double(x) to guarantee that x is always numeric double-precision. For input argument X that could be a large matrix, I might do something like this:
if (!is.double(X) || !is.matrix(X))
stop("Input X must be a double-precision matrix")
and if X is expected to be small, I could do the check and convert to double, e.g.
if (!is.double(X))
storage.mode(X) <- "double"
These are the types of errors that users can easily make if they are not careful (e.g., read.table will sometimes create integer columns instead of floating-point columns).
The general idea is that the C++ code should be as short as possible, and only do the numerically intensive stuff, and leave everything else to R.
On a related note, for treating special cases (e.g., R = 1), you might need to force-convert vectors to matrices, or matrices to 3-d arrays, etc., using as.matrix and as.array so that the input to the C++ function is exactly the right structure. This is a major source of errors. Are there other special cases we need to deal with (e.g., 1 mixture component, 1 sample?).
On the output side of things, it is often sometimes for the user to convert matrices back to vectors, e.g., using drop(X) or simply c(X) to convert a matrix X to a vector.
Maybe you don't need to "hide armadillo warning / error messages" now that we are using RcppArmadillo? You might want to check this.
Glad to see that you added comments to the top of most of the functions in mash.h! Very helpful. Please make sure to briefly describe what each function does and what all the inputs are for, and their formats.
One cool thing about Armadillo is that the code can be very concise. If you can do use namespace Rcpp; use namespace arma; then you could get rid of all the Rcpp:: and arma:: and make your code easier to follow. Maybe give it a try?
When you have a chance, expand on the verbose argument and give more info about the analysis; e.g. number of samples, number of dimensions, summary of fitted mixture weights, and other info about convergence of optimization, etc." More info = more power to user.
From @pcarbo A few small suggestions for improving the R package code as we move forward:
Makefiles are typically not included in R packages, so it might be confusing to have one. Instead, I put the instructions in the README---feel free to elaborate on them as needed.
For testing different implementations (e.g., R vs. Rcpp), I typically try to have one interface (i.e., function), then have an if-then statement for
algorithm.version
. The idea is that these same implementations should reproduce the exact same behaviour. On the other hand, if you have multiple functions, there is no guarantee that the different functions will have the same behaviour.When writing
foo_rcpp
, I typically implement an internal "wrapper" R functionfoo_call_rcpp
(not exported to namespace) that does all the argument checking and forcing argument conversions before calling the C++ function. The idea is that you want the error handling to be in the R as much as possible to make it more user-friendly and easier to debug (and to make the code more concise and easier to read). For example, for scalar argumentsx
I will usually doas.double(x)
to guarantee thatx
is always numeric double-precision. For input argumentX
that could be a large matrix, I might do something like this:and if
X
is expected to be small, I could do the check and convert to double, e.g.These are the types of errors that users can easily make if they are not careful (e.g.,
read.table
will sometimes create integer columns instead of floating-point columns).The general idea is that the C++ code should be as short as possible, and only do the numerically intensive stuff, and leave everything else to R.
See here for an example implementation of some of these ideas: https://github.com/pcarbo/varbvs/blob/master/varbvs-R/R/varbvsmixupdate.R (edited)
On a related note, for treating special cases (e.g.,
R = 1
), you might need to force-convert vectors to matrices, or matrices to 3-d arrays, etc., usingas.matrix
andas.array
so that the input to the C++ function is exactly the right structure. This is a major source of errors. Are there other special cases we need to deal with (e.g., 1 mixture component, 1 sample?).On the output side of things, it is often sometimes for the user to convert matrices back to vectors, e.g., using
drop(X)
or simplyc(X)
to convert a matrixX
to a vector.Maybe you don't need to "hide armadillo warning / error messages" now that we are using
RcppArmadillo
? You might want to check this.Glad to see that you added comments to the top of most of the functions in
mash.h
! Very helpful. Please make sure to briefly describe what each function does and what all the inputs are for, and their formats.One cool thing about Armadillo is that the code can be very concise. If you can do
use namespace Rcpp; use namespace arma;
then you could get rid of all theRcpp::
andarma::
and make your code easier to follow. Maybe give it a try?When you have a chance, expand on the
verbose
argument and give more info about the analysis; e.g. number of samples, number of dimensions, summary of fitted mixture weights, and other info about convergence of optimization, etc." More info = more power to user.