pharmaverse / admiral

ADaM in R Asset Library
https://pharmaverse.github.io/admiral
Apache License 2.0
222 stars 62 forks source link

Closes #2244 fix_slice_derivation: add environment to params() and update sl… #2440

Closed bundfussr closed 4 months ago

bundfussr commented 4 months ago

…ice_derivation() and call_derivation()

Thank you for your Pull Request! We have developed this task checklist from the Development Process Guide to help with the final steps of the process. Completing the below tasks helps to ensure our reviewers can maximize their time on your code as well as making sure the admiral codebase remains robust and consistent.

Please check off each taskbox as an acknowledgment that you completed the task or check off that it is not relevant to your Pull Request. This checklist is part of the Github Action workflows and the Pull Request will not be merged into the main branch until you have checked off each task.

github-actions[bot] commented 4 months ago

Code Coverage

Package Line Rate Health
admiral 97%
Summary 97% (4743 / 4878)
manciniedoardo commented 4 months ago

@bundfussr thanks for working on this! image

manciniedoardo commented 4 months ago

@bundfussr I tried rerunning this example from #2244 and I still get an error, do you know what's up?

library(admiral)
library(pharmaversesdtm)
library(dplyr, warn.conflicts = FALSE)

data("admiral_adsl")
data("ae")
data("ds")
data("ex")

adsl <- admiral_adsl

ds_ext <- derive_vars_dt(
  ds,
  dtc = DSSTDTC,
  new_vars_prefix = "DSST"
)

ex_ext <- ex %>%
  derive_vars_dtm(
    dtc = EXSTDTC,
    new_vars_prefix = "EXST"
  ) %>%
  derive_vars_dtm(
    dtc = EXENDTC,
    new_vars_prefix = "EXEN",
    time_imputation = "last"
  )

derive_eop01stt_2 <- function(adsl_in, ex_in, week){

  adsl_in %>% 
    slice_derivation(
      derivation = derive_vars_merged,
      args = params(
        new_vars = exprs(EOP01STT = "Completed"),
        missing_values = exprs(EOP01STT = "Ongoing"),
        by_vars = exprs(STUDYID, USUBJID)
      ),
      derivation_slice(
        filter = ACTARMCD %in% c("PBO"),
        args = params(
          dataset_add = ex_in,
          filter_add = EXTRT %in% c("PLACEBO") & VISIT == "WEEK 2"
        )
      ),
      derivation_slice(
        filter = TRUE,
        args = params(
          dataset_add = ex_ext,
          filter_add = !(EXTRT %in% c("PLACEBO")) & VISIT == week
        )
      )
    ) 
}

derive_eop01stt_2(adsl, ex_ext, "WEEK 24")

Returns:

Error in `filter()`:
ℹ In argument: `!(EXTRT %in% c("PLACEBO")) & VISIT == week`.
Caused by error:
! object 'week' not found
bundfussr commented 4 months ago

It should be

filter_add = !(EXTRT %in% c("PLACEBO")) & VISIT == !!week

instead of

filter_add = !(EXTRT %in% c("PLACEBO")) & VISIT == week

because week is not a column of the dataset.

By the way, it should also be

filter = ACTARMCD %in% c("Pbo"),

instead of

filter = ACTARMCD %in% c("PBO"),

Otherwise, no records match the condition. This explains the "strange" behavior that the order matters.

ddsjoberg commented 4 months ago

If we make the below a quosure, it will have the env attached and should be able to evaluate without the !!week. We should probably aim to have our expression syntax match that of dplyr::filter() (and many many other functions too). Sorry I don't have the bandwidth to look at these details right now.

filter_add = !(EXTRT %in% c("PLACEBO")) & VISIT == week

manciniedoardo commented 4 months ago

Thanks @bundfussr!

because week is not a column of the dataset.

Lol, I fell for the same pitfall I wrote about a few weeks ago!

bundfussr commented 4 months ago

If we make the below a quosure, it will have the env attached and should be able to evaluate without the !!week. We should probably aim to have our expression syntax match that of dplyr::filter() (and many many other functions too). Sorry I don't have the bandwidth to look at these details right now.

filter_add = !(EXTRT %in% c("PLACEBO")) & VISIT == week

Using quosures for filter arguments is a major change (for us and the users). I wouldn't do it now because we want to release soon. I think the dplyr and other functions behave differently because they are more focused on interactive calls. There the ambiguity introduced by quosures is acceptable. In our context I would prefer to be more strict and avoid ambiguities. If we use quosures, filter_add = !(EXTRT %in% c("PLACEBO")) & VISIT == week would work in most cases but if the dataset contains the week variable, it would fail or produce wrong results. So even if we use quosures, we should use filter_add = !(EXTRT %in% c("PLACEBO")) & VISIT == .env$week or filter_add = !(EXTRT %in% c("PLACEBO")) & VISIT == !!week (as recommended in Preventing Collisions). The advantage of our approach is that the code fails immediately if you forget the !!. If quosures are used and you forget the .env$ or !!, it fails later or produces wrong results. In this case, it is much more difficult to determine the cause of the problem. Therefore I would not change our approach.

ddsjoberg commented 4 months ago

We dont need to request users pass quosures, so there should be no change for them. Rather we capture what they pass as a quosure, so we have the appropriate env to evaluate it in.

it will be confusing to users why code like this works: dplyr::filter(filter_add = !(EXTRT %in% c("PLACEBO")) & VISIT == week) But the same code does not work for filtering in admiral.

bundfussr commented 4 months ago

The users don't need to pass quosures but they should use filter_add = !(EXTRT %in% c("PLACEBO")) & VISIT == !!week instead of filter_add = !(EXTRT %in% c("PLACEBO")) & VISIT == week (unless they are just playing around in the console).

ddsjoberg commented 4 months ago

The users don't need to pass quosures but they should use filter_add = !(EXTRT %in% c("PLACEBO")) & VISIT == !!week instead of filter_add = !(EXTRT %in% c("PLACEBO")) & VISIT == week (unless they are just playing around in the console).

That great we're on the same page about not requesting users pass proper quosures. 👯

But requiring that users include !!week, when it's not required in dplyr::filter() will certainly be a pain point.

bms63 commented 4 months ago

@manciniedoardo if this satisfies your requirements then can your approve and merge in?

@bundfussr apologies for my ignorance and not being a help here. my only concern is that if this is a bug that a user can create - should we be testing that the bug no longer appears or are we confident that with the updates this situation should no longer happen...Like is this an edge case that we need to cover and give feedback to the user through error messaging or is this the case of our function just not working as intended.

bundfussr commented 4 months ago

@bundfussr apologies for my ignorance and not being a help here. my only concern is that if this is a bug that a user can create - should we be testing that the bug no longer appears or are we confident that with the updates this situation should no longer happen...Like is this an edge case that we need to cover and give feedback to the user through error messaging or is this the case of our function just not working as intended.

@bms63 , I have added a unit for the case where it didn't work before (when slice_derivation() is called in a function and an object defined in the function is passed to one of the params() objects).