vubiostat / redcapAPI

R interface to REDCap (http://www.project-redcap.org/)
19 stars 26 forks source link

Vectorize functions that are not vectorized #389

Open pbchase opened 1 month ago

pbchase commented 1 month ago

Please forgive me writing an issue like this, because it is light on detail, but my issue with redcapAPI is the uneven vectorization. I've had this problem multiple times in redcapAPI. Yet, I can't recall where I have seen it save for the last issue I had in February with redcapAPI::renameRecord. I documented that at https://github.com/OuhscBbmc/REDCapR/issues/513.

For any API call, I would like it to accept a vector in a parameter wherever a vector makes sense. https://github.com/OuhscBbmc/REDCapR/issues/513 shows what I think is a reasonable way to do that and a reasonable response from the function. I hope that helps. I'm happy to discuss this in more detail

spgarbet commented 1 month ago

This is a wonderful idea. Our goals the last year was to stabilize the crashing, make it compatible with automated servers and address a lot of security practices. We've hit that goal.

spgarbet commented 1 month ago

renameRecord(rcon, record_name, new_record_name, arm = NULL, ...) is the interface. It checks at present to have 1 record_name and 1 new_record_name. Should we allow n and n and vectorize? Can rcon be vectorized? Does that make sense? Looks like arm as well should be. The ... is going to the makeApiCall.

nutterb commented 1 month ago

I'm going to add some background information to the discussion. In doing so, I want to make sure I don't come across as advocating a hard "no" to this request. I do recognize that there be monsters lurking in these here waters, so figure it is good to lay out why redcapAPI operates the way it does before making decisions about how/whether to change it.

The functions in recapAPI adhere very closely to the structure of the actual API parameters. In the recent updates, we've tried to make a one-to-one mapping of REDCap API argument to redcapAPI function argument. If the REDCap API accepts an array as an argument, redcapAPI accepts a vector. (The only exceptions I can think of to this rule exist in some of the file repository functions).

Using renameRecord as an example, the API only accepts a single value into the record_name, and new_record_name argument. To reframe it in a way, the metric redcapAPI currently uses for "accept a vector in a parameter wherever a vector makes sense" is whether the API itself will accept that vector. For this function, even if we "vectorize" the argument, the internal code still has to manage creating multiple calls to the API.

Again, this isn't to say that vectorizing is a bad idea. Making things just a little bit easier for the end user is a good goal. It just needs some consideration in the design choices.

Some initial questions I would pose are

  1. What should happen if one of the calls to the API fails. Should it terminate at the failure, or catch the error and attempt to proceed with any further elements in the vector?
  2. How should recycling be handled? Should it be permitted?
  3. How should the results received from the API be captured for the user?

Managing vectorization in the meantime

Ultimately, as @pbchase demonstrated in the REDCapR thread, loops and/or apply statements are the way the vectorization needs to be handled. There's a neat little function, Vectorize, that can assist with that and turn single use functions into vectorized functions (it uses mapply internally). I think it would be extremely useful in this case:

library(redcapAPI)
renameRecord_v <- Vectorize(renameRecord, 
                            vectorize.args = c("record_name", 
                                               "new_record_name"))

renameRecord_v(rcon, 
               c("1", "2", "3"), 
               c("001", "002", "003"))

#### OR if using a data frame ####

RecordRenames <- 
  data.frame(old = c("1", "2", "3"), 
             new = c("001", "002", "003"))

renameRecord(rcon, 
             record_name = RecordRenames$old, 
             new_record_name = RecordRenames$new)

and if you like keep your workspace free of local objects and have a vendetta against readable code

library(redcapAPI)
Vectorize(
  renameRecord, 
  vectorize.args = c("record_name", 
                     "new_record_name"))(record_name = c("1, 2, 3"), 
                                         new_record_name = c("001", "002", "003"))
nutterb commented 1 month ago

renameRecord(rcon, record_name, new_record_name, arm = NULL, ...) is the interface. It checks at present to have 1 record_name and 1 new_record_name. Should we allow n and n and vectorize? Can rcon be vectorized? Does that make sense? Looks like arm as well should be. The ... is going to the makeApiCall.

My previous comments aside, I think renameRecord is probably one of the best candidates for vectorization. It has no argument validation outside of the length of the arguments, which means it doesn't have to deal with messy issues about validating against anything in the project. Consequently, we aren't going to inflate execution time by making a bunch of validation checks against the API (like we do with exporting records, for example)

My proposal for output would be a three column data frame with record_name, new_record_name, and success, with success being a logical indicating if the rename completed successfully. and perhaps a fourth column with any error message returned from the API?

nutterb commented 1 month ago

Other functions that might support vectorization:

pbchase commented 1 month ago

The functions in recapAPI adhere very closely to the structure of the actual API parameters. In the recent updates, we've tried to make a one-to-one mapping of REDCap API argument to redcapAPI function argument. If the REDCap API accepts an array as an argument, redcapAPI accepts a vector.

In my conversations with Will about REDCapR, I have made similar arguments. It makes sense for an R package to reflect the behavior of the REDCap API.

Yet these are R packages. They exist to add value to what is otherwise a curl dialog. They exist in the expectations of R developers as well. R developers expect functions to be vectorized. When I realized I could only rename one record at a time, I was incredulous. My response was a visceral "NO". I have work to do. Thus my purrr-based snippet.

As to how it should behave, I vote it should process every rename it can and report back the successes and failures. That's what the purrr snippet does.

If you want to vectorize but are in the do-nothing-if-anything-might-fail camp, there's some work to do to ensure every rename will work. You might even need to make another API call. You could:

  1. verify each source record_id is unique.
  2. verify each target record_id is unique.
  3. verify the combined set of target and source record_ids is distinct.
  4. verify each source record_id exists in REDCap.

That would give you good odds of success in the rename, but not 100%. I think you'd still need to catch the failure on the Nth rename and report which ones worked and which did not.

If you want to reflect both REDCap-like behavior and a more R-centric behavior, you could add a parameter to allow only unit-length vectors. If you want REDCap-behavior, set that option to TRUE. If you want more vectorized behavior, set it to FALSE.

You could also add a parameter to fail the batch on the first failure.

spgarbet commented 1 month ago

Really good insight @nutterb. In a recent refactor there is one really gnarlly piece of code dealing with multiple API calls in a loop in a function. It return a list of results of the calls, some could be errors. It was kind of clunky, but given the parameters you've described probably the best.

Maybe a vectorized helper function, 'renameRecords' that returns a list of success / failures. If we go down that road, then we should look for a consistent interface that works with continuations. I don't see the output of renameRecords being that useful in a continuation.

nutterb commented 1 month ago

I find myself very uncomfortable with having a renameRecord and also a renameRecords.

My inclination is to continue with renameRecord, adjust the arguments to accept vectors of the same length (for record_name and new_record_name. If arms is not NULL, it must also be of the same length). Then move the API call code into an unexported subroutine. Something like

renameRecord <- function(rcon, record_name, new_record_name, arm, ...){
  ## ARGUMENT VALIDATION ##

  ## PREP OUTPUT OBJECT CONTAINER ##
  output <- data.frame(...) # or whatever
  response <- vector("list", length(record_name))

  for (i in seq_along(record_name)){
    response[[i]] <- .renameRecordApiCall(rcon, record_name[i], new_record_name[i], arm[i], ...)
  }

  ## FINALIZE OUTPUT OBJECT CONTAINER ##
  return(output)
}

.renameRecordApiCall <- function(rcon, record_name, new_record_name, arm, ...) {
  body <- list(content = "record", 
               action = "rename", 
               record = record_name, 
               new_record_name = new_record_name, 
               arm = arm)

  ###################################################################
  # Call the API                                                 ####
  invisible('1' == as.character(makeApiCall(rcon, body, ...))) 
}