nlmixr2 / rxode2

rxode2
https://nlmixr2.github.io/rxode2/
GNU General Public License v3.0
28 stars 7 forks source link

Feature request: Model/data anonymization function #755

Open billdenney opened 1 month ago

billdenney commented 1 month ago

A perennial issue with reporting nlmixr2 issues is anonymizing the model and/or data.

I've worked up the function below that covers almost all anonymization. The things that I see which are not (or may not be) covered and which may be issues for inclusion are:

  1. If a fixed value is set in the model (e.g. model({mw <- 150000}), neither the name mw nor the fixed value are modified. How can I find a list of all left-hand-side names which are given in this way, i.e. not a model parameter, covariate, residual error value, output, or compartment name that are also not special lines in the model like derivatives?
  2. How can I remove meta-information?
  3. It uses the data simplification function from nlmixr2targets to minimize the dataset width. (We may want to move that function to the same package where this lives.)
  4. I need to test that it will anonymize eta values.

If generally acceptable, where do you think this should live? I think rxode2, but I can imagine

#' Anonymize an rxode2/nlmixr2 model and dataset to assist with issue reporting
#'
#' It is your responsibility to inspect any information that you are sharing.
#' Do not share confidential data, and this function is designed to help you not
#' share anything that may be confidential.  But, final responsibility remains
#' with the person sharing the data.
#'
#' When parameters and DV values are anonymized, they maintain the approximate
#' order of magnitude and the sign.  This is done by multiplying by a uniform
#' random number between 0.5 and 1.5.
#'
#' The changes in this function may change if an issue occurs or not.  Please
#' retest the issue with the updated model and dataset to see if it recurs.
#'
#' @param model The model to anonymize
#' @param data The dataset to minimize and anonymize
#' @param anonParam Anonymize the initial conditions of the parameters (except
#'   values that are exactly 0 or 1)?
#' @param anonParam1 Also anonymize the initial conditions of the parameters
#'   that are exactly 1?  This only has an effect if `anonParam = TRUE`.
#' @param anonDv Anonymize the DV values in the dataset?
#' @param nId Number of subject identifiers to include in the dataset.  The
#'   first `nId` `"id"` column values will be included to minimize dataset size
#'   (or all `"id"` values if `nId` is greater than the number in the dataset).
#'   Set to `Inf` to include all data.
#'
#' @returns A list with two components, `"model"` and `"data"` containing your
#'   model and dataset, respectively
#' @export
nlmixr2Anonymize <- function(model, data, anonParam = TRUE, anonParam1 = FALSE, anonDv = TRUE, nId = 5) {
  modelUi <- rxode2::as.rxUi(model)
  # Drop all meta-data (FIXME: how do I do this?  It says it should not be overwritten)
  #modelUi$meta <- NULL
  # Drop all labels
  modelUi$iniDf$label <- NA_character_

  dataSimple <- nlmixr2targets::nlmixr_data_simplify(data = data, object = modelUi)
  # Find everything that needs to be renamed
  covariateRename <- modelUi$allCovs
  paramRename <- unique(unlist(modelUi$params[c("pop", "resid", "group", "cmt", "output")]))
  allRename <- c(covariateRename, paramRename)
  allRename <- stats::setNames(allRename, paste0("anon", seq_along(allRename)))
  argsRename <- list(.data = modelUi)
  for (idx in seq_along(allRename)) {
    currentRename <- setNames(list(as.name(allRename[idx])), names(allRename[idx]))
    argsRename <- append(argsRename, currentRename)
  }
  # And rename them all in the model
  modelUiRenamed <- do.call(rxode2::rxRename, argsRename)

  # Anonymize initial conditions while maintaining order of magnitude and sign
  if (anonParam) {
    newIniDf <- modelUiRenamed$iniDf
    currentFactor <- runif(n = nrow(newIniDf), min = 0.5, max = 1.5)
    if (!anonParam1) {
      # Keep values of 1 as they are not likely to be informative and
      currentFactor[newIniDf$est == 1] <- 1
    }
    newIniDf$lower <- newIniDf$lower * currentFactor
    newIniDf$est <- newIniDf$est * currentFactor
    newIniDf$upper <- newIniDf$upper * currentFactor
    rxode2::ini(modelUiRenamed) <- newIniDf
  }

  # Then, rename all covarites in the dataset
  oldNames <- names(dataSimple)
  newNames <- oldNames
  for (currentCovariate in covariateRename) {
    newNames[oldNames %in% currentCovariate] <-
      names(allRename[allRename %in% currentCovariate])
  }
  dataRenamed <- setNames(dataSimple, nm = newNames)

  # Rename compartments in the dataset
  for (oldCmt in modelUi$params$cmt) {
    newCmt <- names(allRename[allRename %in% oldCmt])
    mask <- dataRenamed$cmt %in% oldCmt
    if (any(mask)) {
      dataRenamed$cmt[mask] <- newCmt
    }
  }

  # Anonymize the ids in the dataset
  dataRenamed$id <- as.integer(factor(dataRenamed$id))

  # Anonymize DV in the dataset
  if (anonDv && "dv" %in% names(dataRenamed)) {
    dataRenamed$dv <- dataRenamed$dv * runif(n = nrow(dataRenamed), min = 0.5, max = 1.5)
  }

  # Reduce the dataset size, if desired
  if (is.finite(nId)) {
    allId <- unique(dataRenamed$id)
    if (nId < length(allId)) {
      dataRenamed <- dataRenamed[dataRenamed$id %in% allId[seq_len(nId)], ]
    }
  }

  list(
    model = as.function(modelUiRenamed),
    data = dataRenamed
  )
}
mattfidler commented 1 month ago

Probably in rxode2 with appropriate tests.