pharmaverse / admiral

ADaM in R Asset Library
https://pharmaverse.github.io/admiral
Apache License 2.0
220 stars 61 forks source link

Bug: `derive_vars_joined()` improper environment handling #2391

Closed manciniedoardo closed 5 months ago

manciniedoardo commented 5 months ago

What happened?

If derive_vars_joined() is called inside a custom function, and the filter_add or filter_join arguments are dynamically generated within the custom function and then passed onto derive_vars_joined(), then derive_vars_joined() does not find them.

This issue is similar to #2244 - I think something is going wrong at a deeper level within admiral. This would benefit from an attentive review - as our users become more and more familiar with the package, they may want to write their own functions using our toolset, increasing the likelihood of this kind of error.

Session Information

R version 4.2.2 Patched (2022-11-10 r83330) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.6 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods
[7] base

other attached packages: [1] tidyr_1.3.0 lubridate_1.9.3 dplyr_1.1.4 tibble_3.2.1
[5] admiral_1.0.0

loaded via a namespace (and not attached): [1] rstudioapi_0.14 magrittr_2.0.3 hms_1.1.3
[4] tidyselect_1.2.0 timechange_0.2.0 R6_2.5.1
[7] rlang_1.1.2 fansi_1.0.6 stringr_1.5.1
[10] tools_4.2.2 utf8_1.2.4 cli_3.6.2
[13] withr_2.5.2 lifecycle_1.0.4 purrr_1.0.2
[16] vctrs_0.6.5 glue_1.6.2 admiraldev_1.0.0.9004 [19] stringi_1.8.3 compiler_4.2.2 pillar_1.9.0
[22] generics_0.1.3 pkgconfig_2.0.3

Reproducible Example

library(tibble)
library(lubridate)
library(dplyr, warn.conflicts = FALSE)
library(tidyr)

adae <- tribble(
  ~USUBJID, ~ASTDY,
  "1",           3,
  "1",          22,
  "2",           2
)

adlb <- tribble(
  ~USUBJID, ~PARAMCD, ~ADY, ~AVAL,
  "1",      "HGB",       1,   8.5,
  "1",      "HGB",       3,   7.9,
  "1",      "HGB",       5,   8.9,
  "1",      "HGB",       8,   8.0,
  "1",      "HGB",       9,   8.0,
  "1",      "HGB",      16,   7.4,
  "1",      "HGB",      24,   8.1,
  "1",      "ALB",       1,    42,
  "1",      "ALB",       3,    14,
  "1",      "ALB",       5,     8,
  "1",      "ALB",       8,    12,
  "1",      "ALB",       9,     3,
  "1",      "ALB",      16,    16,
  "1",      "ALB",      24,    11
)

# add highest lab test value within two weeks before AE,  take earliest if more than one ----

## Open code ----
my_condition <- "PARAMCD == 'HGB'"

# This works
derive_vars_joined(
  adae,
  dataset_add = adlb,
  by_vars = exprs(USUBJID),
  order = exprs(AVAL, desc(ADY)),
  new_vars = exprs(HGB_MAX = AVAL, HGB_DY = ADY),
  join_type = "all",
  filter_add = eval(parse(text = my_condition)),
  filter_join = ASTDY - 14 <= ADY & ADY <= ASTDY,
  mode = "last"
)

## Inside a function (which controls the condition passed to filter_add) ----
dummy_function <- function(adae_in = adae, adlb_in = adlb, hgb_or_alb){

  my_fn_condition <- ifelse(hgb_or_alb == "hgb", "PARAMCD == 'HGB'", "PARAMCD == 'ALB'")

  derive_vars_joined(
    adae_in,
    dataset_add = adlb_in,
    by_vars = exprs(USUBJID),
    order = exprs(AVAL, desc(ADY)),
    new_vars = exprs(HGB_MAX = AVAL, HGB_DY = ADY),
    join_type = "all",
    filter_add = eval(parse(text = my_fn_condition)),
    filter_join = ASTDY - 14 <= ADY & ADY <= ASTDY,
    mode = "last"
  )

}

# This errors
dummy_function(hgb_or_alb = "HGB")

image

A similar error canbe manufactured for the filter_join condition.

manciniedoardo commented 5 months ago

@ddsjoberg helpfully pointed out a workaround using rlang::inject() and rlang::parse_expr() while we fix this issue:


## Inside a function (which controls the condition passed to filter_add) ----
dummy_function <- function(adae_in = adae, adlb_in = adlb, hgb_or_alb){

  my_fn_condition <- ifelse(hgb_or_alb == "hgb", "PARAMCD == 'HGB'", "PARAMCD == 'ALB'")

  rlang::inject(
    derive_vars_joined(
      adae_in,
      dataset_add = adlb_in,
      by_vars = exprs(USUBJID),
      order = exprs(AVAL, desc(ADY)),
      new_vars = exprs(HGB_MAX = AVAL, HGB_DY = ADY),
      join_type = "all",
      filter_add = !!rlang::parse_expr(my_fn_condition),
      filter_join = ASTDY - 14 <= ADY & ADY <= ASTDY,
      mode = "last"
    )
  )
}

# This works!
dummy_function(hgb_or_alb = "HGB")
bundfussr commented 5 months ago

The following works as well.

  ## Inside a function (which controls the condition passed to filter_add) ----
  dummy_function <- function(adae_in = adae, adlb_in = adlb, hgb_or_alb){

    if(hgb_or_alb == "hgb"){
      my_fn_condition <- expr(PARAMCD == "HGB")
    }
    else{
      my_fn_condition <- expr(PARAMCD == "ALB")
    }

    derive_vars_joined(
      adae_in,
      dataset_add = adlb_in,
      by_vars = exprs(USUBJID),
      order = exprs(AVAL, desc(ADY)),
      new_vars = exprs(HGB_MAX = AVAL, HGB_DY = ADY),
      join_type = "all",
      filter_add = !!my_fn_condition,
      filter_join = ASTDY - 14 <= ADY & ADY <= ASTDY,
      mode = "last"
    )

  }

  # This works
  dummy_function(hgb_or_alb = "hgb")

I am not sure if we need to fix anything. I think it works as expected. However, we could add a vignette which provides guidance for such use cases.

manciniedoardo commented 5 months ago

Thanks @bundfussr that's a neater fix 😄

I am not sure if we need to fix anything. I think it works as expected. However, we could add a vignette which provides guidance for such use cases.

I still think it's mysterious why this is failing. It should at least be investigated, because there's no apparent reason why it should fail. Also the similar issue #2244 doesn#t lend itself to this sort of fix anyway (at least I think?)

bundfussr commented 5 months ago

The filter argument expects an expression which is evaluable in the dataset. It must not contain any objects of the calling environment. The first example fails because the expression is

> expr(eval(parse(text = my_fn_condition)))
eval(parse(text = my_fn_condition))

and contains my_fn_condition, which is not in the dataset. In the working examples the expression is

> expr(!!expr(PARAMCD == "HGB"))
PARAMCD == "HGB"
manciniedoardo commented 5 months ago

The filter argument expects an expression which is evaluable in the dataset. It must not contain any objects of the calling environment. The first example fails because the expression is

> expr(eval(parse(text = my_fn_condition)))
eval(parse(text = my_fn_condition))

and contains my_fn_condition, which is not in the dataset. In the working examples the expression is

> expr(!!expr(PARAMCD == "HGB"))
PARAMCD == "HGB"

Thanks @bundfussr that makes sense to me now. I wonder if, as you suggest, we could clarify this behaviour in a user-facing about exprs()/quoting covering:

Note that we already discussed this ~7 months ago here and deemed that a full vignette was not needed at the time. I am suggesting to revisit this and provide a short bit of documentation, given I myself have fallen in this trap.

For reference, we could re-use/enhance some of the text here.

@pharmaverse/admiral what do you think?

bundfussr commented 5 months ago

This is another example where the error message could be improved. The current error message does not mention the arguments which caused the error (filter_add and dataset_add) and it mentions filter(). But this function was not called by the user. I think we should catch the error and provide a more user friendly error message like

Error in `derive_vars_joined():
Applying `filter_add = eval(parse(text = my_fn_condition))` to `dataset_add` caused the following error:
<error message from filter() function>

We could even handle the "object not found" error separately and provide additional hints like "Please ensure that all objects used in the condition are available in dataset_add. For more details see 'link to vignette'.".

@ddsjoberg , what do you think? Should this be the next task for the error messaging group?

bms63 commented 5 months ago

@bundfussr I have put something slightly similar https://github.com/pharmaverse/admiraldev/issues/413.

I think this is round 2 for the group? Seems like we could finish this out before June release?

bundfussr commented 5 months ago

For reference, we could re-use/enhance some of the text here.

This is already on the Get Started. page. However, it seems we need more.

manciniedoardo commented 5 months ago

For reference, we could re-use/enhance some of the text here.

This is already on the Get Started. page. However, it seems we need more.

Hmm. I wonder if we could move to somewhere more visible? Eg in its own page under Get Started? Incidentally I also think this page needs a rename in our toolbar at the top - i wouldn't think to go look for this information in a page titled "Simple ADSL Walkthrough".

image

bundfussr commented 5 months ago

I agree, "Simple ADSL Walkthrough" is misleading.

bms63 commented 5 months ago

@manciniedoardo vignette makes sense, but I guess I am wondering if we should start looking at tidyselect option again? Daniel seemed to convey that the updates would be minimal to users with exprs still an option...perhaps the vignette can help discuss the two options going forward.

@bundfussr @manciniedoardo I like the "Simple ADSL Walkthrough" as that is 80% of the vignette.

Should we have a "Simple Design Walkthrough?" or "Programming Considerations"?? as the new vignette to go under the Get Started?

bundfussr commented 5 months ago

I like the "Simple ADSL Walkthrough" as that is 80% of the vignette.

The ADSL example is a major part. But I wonder if we should shorten it. The second part after the ADSL example seems more important to me.

It is also not clear to me what the ADSL example should show. Should it show how to create ADSL? Then "Simple ADSL Walkthrough" would be a good title. Or should it introduce the main admiral concepts? Then "Simple ADSL Walkthrough" would be misleading.

bms63 commented 5 months ago

I see the "Simple ADSL Walkthrough" as way for a user to see admiral in action quickly. A nice way to "Get Started" :)

but

I think there is some great value to gain from a "admiral Ways of Working" or "admiral programming design and concepts" where we could talk more about the details of common design principles in admiral and common issues a ADam programmer will face while using R (sorting, blanks, etc). Still keeping it high-level and accessible.

ddsjoberg commented 5 months ago

@ddsjoberg , what do you think? Should this be the next task for the error messaging group?

@bundfussr sounds great! I like the suggested messaging. We can also capture the error from filter() and print our custom message followed by the message that filter() throws as well.

(Re-reading and seeing that is exactly what you suggested haha)

manciniedoardo commented 5 months ago

I see the "Simple ADSL Walkthrough" as way for a user to see admiral in action quickly. A nice way to "Get Started" :)

but

I think there is some great value to gain from a "admiral Ways of Working" or "admiral programming design and concepts" where we could talk more about the details of common design principles in admiral and common issues a ADaM programmer will face while using R (sorting, blanks, etc). Still keeping it high-level and accessible.

I agree @bms63. We could add this "Programming Design and Concepts" page under the "Get Started" tab. This would be a more user-facing version of our Programming Strategy page, and could also collect some of the items in the Simple ADSL walkthrough page (such as the exprs() text). Also, it would be an opportunity to consolidate what we have in our FAQ page as well.

Incidentally in this way we'd also make the "Simple ADSL Walkthrough" page less misleading in terms of the title as it would just contain ADSL code and nothing more.

Regarding tidyselect - I'd be interested to see how we could implement this without breaking backwards compatibility?

bms63 commented 5 months ago

Hi all,

I'm going to close this issue as it has morphed into a documentation update. I made a separate related documentation issue and linked this issue in it. https://github.com/pharmaverse/admiral/issues/2395