A perennial issue with reporting nlmixr2 issues is anonymizing the model and/or data.
I've worked up the function below that covers almost all anonymization. The things that I see which are not (or may not be) covered and which may be issues for inclusion are:
If a fixed value is set in the model (e.g. model({mw <- 150000}), neither the name mw nor the fixed value are modified. How can I find a list of all left-hand-side names which are given in this way, i.e. not a model parameter, covariate, residual error value, output, or compartment name that are also not special lines in the model like derivatives?
How can I remove meta-information?
It uses the data simplification function from nlmixr2targets to minimize the dataset width. (We may want to move that function to the same package where this lives.)
I need to test that it will anonymize eta values.
If generally acceptable, where do you think this should live? I think rxode2, but I can imagine
#' Anonymize an rxode2/nlmixr2 model and dataset to assist with issue reporting
#'
#' It is your responsibility to inspect any information that you are sharing.
#' Do not share confidential data, and this function is designed to help you not
#' share anything that may be confidential. But, final responsibility remains
#' with the person sharing the data.
#'
#' When parameters and DV values are anonymized, they maintain the approximate
#' order of magnitude and the sign. This is done by multiplying by a uniform
#' random number between 0.5 and 1.5.
#'
#' The changes in this function may change if an issue occurs or not. Please
#' retest the issue with the updated model and dataset to see if it recurs.
#'
#' @param model The model to anonymize
#' @param data The dataset to minimize and anonymize
#' @param anonParam Anonymize the initial conditions of the parameters (except
#' values that are exactly 0 or 1)?
#' @param anonParam1 Also anonymize the initial conditions of the parameters
#' that are exactly 1? This only has an effect if `anonParam = TRUE`.
#' @param anonDv Anonymize the DV values in the dataset?
#' @param nId Number of subject identifiers to include in the dataset. The
#' first `nId` `"id"` column values will be included to minimize dataset size
#' (or all `"id"` values if `nId` is greater than the number in the dataset).
#' Set to `Inf` to include all data.
#'
#' @returns A list with two components, `"model"` and `"data"` containing your
#' model and dataset, respectively
#' @export
nlmixr2Anonymize <- function(model, data, anonParam = TRUE, anonParam1 = FALSE, anonDv = TRUE, nId = 5) {
modelUi <- rxode2::as.rxUi(model)
# Drop all meta-data (FIXME: how do I do this? It says it should not be overwritten)
#modelUi$meta <- NULL
# Drop all labels
modelUi$iniDf$label <- NA_character_
dataSimple <- nlmixr2targets::nlmixr_data_simplify(data = data, object = modelUi)
# Find everything that needs to be renamed
covariateRename <- modelUi$allCovs
paramRename <- unique(unlist(modelUi$params[c("pop", "resid", "group", "cmt", "output")]))
allRename <- c(covariateRename, paramRename)
allRename <- stats::setNames(allRename, paste0("anon", seq_along(allRename)))
argsRename <- list(.data = modelUi)
for (idx in seq_along(allRename)) {
currentRename <- setNames(list(as.name(allRename[idx])), names(allRename[idx]))
argsRename <- append(argsRename, currentRename)
}
# And rename them all in the model
modelUiRenamed <- do.call(rxode2::rxRename, argsRename)
# Anonymize initial conditions while maintaining order of magnitude and sign
if (anonParam) {
newIniDf <- modelUiRenamed$iniDf
currentFactor <- runif(n = nrow(newIniDf), min = 0.5, max = 1.5)
if (!anonParam1) {
# Keep values of 1 as they are not likely to be informative and
currentFactor[newIniDf$est == 1] <- 1
}
newIniDf$lower <- newIniDf$lower * currentFactor
newIniDf$est <- newIniDf$est * currentFactor
newIniDf$upper <- newIniDf$upper * currentFactor
rxode2::ini(modelUiRenamed) <- newIniDf
}
# Then, rename all covarites in the dataset
oldNames <- names(dataSimple)
newNames <- oldNames
for (currentCovariate in covariateRename) {
newNames[oldNames %in% currentCovariate] <-
names(allRename[allRename %in% currentCovariate])
}
dataRenamed <- setNames(dataSimple, nm = newNames)
# Rename compartments in the dataset
for (oldCmt in modelUi$params$cmt) {
newCmt <- names(allRename[allRename %in% oldCmt])
mask <- dataRenamed$cmt %in% oldCmt
if (any(mask)) {
dataRenamed$cmt[mask] <- newCmt
}
}
# Anonymize the ids in the dataset
dataRenamed$id <- as.integer(factor(dataRenamed$id))
# Anonymize DV in the dataset
if (anonDv && "dv" %in% names(dataRenamed)) {
dataRenamed$dv <- dataRenamed$dv * runif(n = nrow(dataRenamed), min = 0.5, max = 1.5)
}
# Reduce the dataset size, if desired
if (is.finite(nId)) {
allId <- unique(dataRenamed$id)
if (nId < length(allId)) {
dataRenamed <- dataRenamed[dataRenamed$id %in% allId[seq_len(nId)], ]
}
}
list(
model = as.function(modelUiRenamed),
data = dataRenamed
)
}
A perennial issue with reporting nlmixr2 issues is anonymizing the model and/or data.
I've worked up the function below that covers almost all anonymization. The things that I see which are not (or may not be) covered and which may be issues for inclusion are:
model({mw <- 150000})
, neither the namemw
nor the fixed value are modified. How can I find a list of all left-hand-side names which are given in this way, i.e. not a model parameter, covariate, residual error value, output, or compartment name that are also not special lines in the model like derivatives?nlmixr2targets
to minimize the dataset width. (We may want to move that function to the same package where this lives.)If generally acceptable, where do you think this should live? I think
rxode2
, but I can imagine