if_then_else support via "conditioned" data frames

ramiromagno commented 4 months ago

Adds a new S3 class (cnd_df) for represented conditioned data frames, i.e. data frames that carry metadata about what records should be used for derivations
Adds support for basic pretty printing of cnd_df objects
Adds a user-facing function for creating such cnd_df objects: condition_by
Adds experimental "mutate"-version function for these conditioned data frames: derive_by_condition()

Thank you for your Pull Request! We have developed this task checklist from the Development Process Guide to help with the final steps of the process. Completing the below tasks helps to ensure our reviewers can maximize their time on your code as well as making sure the oak codebase remains robust and consistent.

Please check off each taskbox as an acknowledgment that you completed the task or check off that it is not relevant to your Pull Request. This checklist is part of the Github Action workflows and the Pull Request will not be merged into the devel branch until you have checked off each task.

[x] Place Closes # into the beginning of your Pull z Request Title (Use Edit button in top-right if you need to update)
[x] Code is formatted according to the tidyverse style guide. Run styler::style_file() to style R and Rmd files
[x] Updated relevant unit tests or have written new unit tests, which should consider realistic data scenarios and edge cases, e.g. empty datasets, errors, boundary cases etc. - See Unit Test Guide
[x] If you removed/replaced any function and/or function parameters, did you fully follow the deprecation guidance?
[x] Update to all relevant roxygen headers and examples, including keywords and families. Refer to the categorization of functions to tag appropriate keyword/family.
[x] Run devtools::document() so all .Rd files in the man folder and the NAMESPACE file in the project root are updated appropriately
[x] Address any updates needed for vignettes and/or templates
[x] Update NEWS.md if the changes pertain to a user-facing function (i.e. it has an @export tag) or documentation aimed at users (rather than developers)
[x] Build oak site pkgdown::build_site() and check that all affected examples are displayed correctly and that all new functions occur on the "Reference" page.
[x] Address or fix all lintr warnings and errors - lintr::lint_package()
[x] Run R CMD check locally and address all errors and warnings - devtools::check()
[x] Link the issue in the Development Section on the right hand side.
[x] Address all merge conflicts and resolve appropriately
[x] Pat yourself on the back for a job well done! Much love to your accomplishment!

github-actions[bot] commented 4 months ago

Package	Line Rate	Health
sdtm.oak	88%	✔
Summary	88% (736 / 836)	✔

ramiromagno commented 4 months ago

Still work in progress.

rammprasad commented 4 months ago

@ramiromagno -

A couple of items

the condition_by is not working when applied at the target variable
Also, can you show me how to apply condition_by when involving both source and target?

Here is the sample code.


study_ct <- read.csv(system.file("cm_domain/cm_sdtm_oak_ct.csv",
                                 package = "sdtm.oak"))

# Read in raw data

cm_raw <-  read.csv(system.file("cm_domain/cm_raw_data.csv",
                                package = "sdtm.oak")) %>%
  #derive oak_id as the row number
  dplyr::mutate(oak_id = structure(seq_len(nrow(.))),
                patient_number = PATNUM,
                raw_source = "ConMed") %>%  
  dplyr::select(oak_id_vars(), dplyr::everything())

# Create CM domain. The first step in creating CM domain is to create the topic variable

cm <-
  # Derive topic variable
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDRAW",
    tgt_var = "CMTRT"
  )  |>
  # Derive CMGRPID when CMTRT == "BABY ASPIRIN"
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID",
    #use condition_by function and fiter when CMTRT is "BABY ASPIRIN"
   ### This condition is not working. It results in an error
    tgt_dat = condition_by(.,CMTRT == "BABY ASPIRIN"),
    id_vars = oak_id_vars()
  )

**Error message**
Error in assign_no_ct(assign_no_ct(raw_dat = cm_raw, raw_var = "MDRAW",  : 
  unused argument (assign_no_ct(raw_dat = cm_raw, raw_var = "MDRAW", tgt_var = "CMTRT"))

Can you give give an example on how to program this mapping?
  # Derive qualifier CMMODIFY -  If collected value in CMMODIFY
  # in cm_raw is different to CM domain CMTRT target variable then
  # assign the collected value to CMMODIFY in CM domain (CM.CMMODIFY)

ramiromagno commented 4 months ago

@ramiromagno -

A couple of items

1. the condition_by is not working when applied at the target variable

2. Also, can you show me how to apply condition_by when involving both source and target?

Here is the sample code.

study_ct <- read.csv(system.file("cm_domain/cm_sdtm_oak_ct.csv",
                                 package = "sdtm.oak"))

# Read in raw data

cm_raw <-  read.csv(system.file("cm_domain/cm_raw_data.csv",
                                package = "sdtm.oak")) %>%
  #derive oak_id as the row number
  dplyr::mutate(oak_id = structure(seq_len(nrow(.))),
                patient_number = PATNUM,
                raw_source = "ConMed") %>%  
  dplyr::select(oak_id_vars(), dplyr::everything())

# Create CM domain. The first step in creating CM domain is to create the topic variable

cm <-
  # Derive topic variable
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDRAW",
    tgt_var = "CMTRT"
  )  |>
  # Derive CMGRPID when CMTRT == "BABY ASPIRIN"
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID",
    #use condition_by function and fiter when CMTRT is "BABY ASPIRIN"
   ### This condition is not working. It results in an error
    tgt_dat = condition_by(.,CMTRT == "BABY ASPIRIN"),
    id_vars = oak_id_vars()
  )

**Error message**
Error in assign_no_ct(assign_no_ct(raw_dat = cm_raw, raw_var = "MDRAW",  : 
  unused argument (assign_no_ct(raw_dat = cm_raw, raw_var = "MDRAW", tgt_var = "CMTRT"))

Can you give give an example on how to program this mapping?
  # Derive qualifier CMMODIFY -  If collected value in CMMODIFY
  # in cm_raw is different to CM domain CMTRT target variable then
  # assign the collected value to CMMODIFY in CM domain (CM.CMMODIFY)

Yeah, there are a couple of things at play here:

The placeholder . is for use with magrittr's pipe %>% and _ for use with R native pipe |>.
The usage of the placeholder in nested calls requires the use of braces, see https://magrittr.tidyverse.org/reference/pipe.html#using-the-dot-for-secondary-purposes. Not sure we want to surface this to the user.

So best approach might be that if we want to condition on the target data set, the one that is being passed along, we should perhaps move the condition_by() call one level up.

Here is a set of examples that hopefully illustrate the different variations:

library(sdtm.oak)
library(magrittr)
library(dplyr, warn.conflicts = FALSE)

study_ct <- read.csv(system.file("cm_domain/cm_sdtm_oak_ct.csv", package = "sdtm.oak"))
cm_raw <-  read.csv(system.file("cm_domain/cm_raw_data.csv", package = "sdtm.oak")) %>%
  #derive oak_id as the row number
  dplyr::mutate(
    oak_id = structure(seq_len(nrow(.))),
    patient_number = PATNUM,
    raw_source = "ConMed"
  ) %>%
  dplyr::select(sdtm.oak:::oak_id_vars(), dplyr::everything())

# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>%
  # Derive CMGRPID when CMTRT == "BABY ASPIRIN"
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID",
    tgt_dat = condition_by(., CMTRT == "BABY ASPIRIN")
  )
#> Error in `admiraldev::assert_character_vector()` at sdtm.oak/R/assign.R:189:3:
#> ! `id_vars` must be a character vector but is a data frame

# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  |>
  condition_by(CMTRT == "BABY ASPIRIN") |>
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID",
    tgt_dat = _
  )
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA

# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  |>
  condition_by(CMTRT == "BABY ASPIRIN") %>%
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID",
    tgt_dat = .
  )
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA

# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>%
  {
    assign_no_ct(
      raw_dat = cm_raw,
      raw_var = "MDNUM",
      tgt_var = "CMGRPID",
      tgt_dat = condition_by(dat = ., CMTRT == "BABY ASPIRIN")
    )
  }
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA

# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>%
  {
    assign_no_ct(
      raw_dat = cm_raw,
      raw_var = "MDNUM",
      tgt_var = "CMGRPID",
      tgt_dat = condition_by(., CMTRT == "BABY ASPIRIN")
    )
  }
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA

Question 2

Regarding question 2 see: tests/testthat/test-assign.R towards the end.

rammprasad commented 4 months ago

Thank you, @ramiromagno. I got it to work. Having it inline will make sense for the users. We will just put both options out there.


cm <-
  # Derive topic variable
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDRAW",
    tgt_var = "CMTRT"
  )  %>%
  # Derive CMGRPID
  {
    assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID",
    #use condition_by function and fiter when CMTRT is "BABY ASPIRIN"
    tgt_dat = condition_by(dat =., CMTRT == "BABY ASPIRIN"),
    id_vars = oak_id_vars()
  )
    } %>%
  {
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "CMMODIFY",
    tgt_var = "CMMODIFY",
    tgt_dat =  condition_by(. , CMMODIFY != CMTRT, .env = cm_raw),
    id_vars = oak_id_vars()
  )
  }

A couple of follow-up questions

We need to be explain in the documentation why we need {} when we use the condition_by function.
Can we rename .env parameter to something else more meaningful? We are comparing two datasets in this case, so can we rename it as .dat2


  {
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "CMMODIFY",
    tgt_var = "CMMODIFY",
    tgt_dat =  condition_by( dat = . , CMMODIFY != CMTRT,  dat2 = cm_raw),
    id_vars = oak_id_vars()
  )
  }
#option 2
  {
  assign_no_ct(
    raw_dat = condition_by(dat = cm_raw , CMMODIFY != CMTRT, dat2= .),
    raw_var = "CMMODIFY",
    tgt_var = "CMMODIFY",
    tgt_dat =  .,
    id_vars = oak_id_vars()
  )
  }

ramiromagno commented 3 months ago

Hi @rammprasad:

The reason why we need braces {...} is explained here: https://magrittr.tidyverse.org/reference/pipe.html#using-the-dot-for-secondary-purposes. We could get rid of this requirement if tgt_dat is the first parameter, and I think we should, as it would simplify the overall piping syntax, making it easier on the user.
Renaming .env to .dat2 (I'm adding the dot) is fine but .env is more aligned with what is customary in the tidyverse ecosystem for scopes where to look for variables, be it an actual environment (env), or a data frame, tibble or simply a list.

ramiromagno commented 3 months ago

Greate Job, @ramiromagno . I have added my comments. We can additional test cases and examples. Also, it will be good to change the way we compare two datasets, by adding an extra argument as suggested.

Thanks @rammprasad. I am not sure what was your take on relocating tgt_dat and make it a first argument. If you agree, then perhaps moving also tgt_var and make it the second argument would also make sense I reckon.

ramiromagno commented 3 months ago

I thought we could circumvent the need for braces when using magrittr's pipe placeholder in nested calls if we moved tgt_dat to being the first argument but seemingly we cannot.

So I think we are left only with three options:

Move the condition_add() one level up and have it in the chain, i.e. not inline within the assign_no_ct()
Add braces around the assign_no_ct() call and use the placeholder where needed
Study the possibility of having another pipe operator whose behavior is equivalent to %>% {...}, but with the advantage of not needing the braces. I asked a question about this here: https://github.com/tidyverse/magrittr/issues/272.

In my opinion, the simplest and easiest for the user is option 1, i.e. moving condition_add() to an earlier position in the chain of commands.

library(sdtm.oak)
library(magrittr)
library(dplyr, warn.conflicts = FALSE)

study_ct <- read.csv(system.file("cm_domain/cm_sdtm_oak_ct.csv", package = "sdtm.oak"))
cm_raw <-  read.csv(system.file("cm_domain/cm_raw_data.csv", package = "sdtm.oak")) %>%
  #derive oak_id as the row number
  dplyr::mutate(
    oak_id = structure(seq_len(nrow(.))),
    patient_number = PATNUM,
    raw_source = "ConMed"
  ) %>%
  dplyr::select(oak_id_vars(), dplyr::everything())

# DOES NOT WORK (native pipe)
# assign_no_ct(raw_dat = cm_raw,
#              raw_var = "MDRAW",
#              tgt_var = "CMTRT")  |>
#   assign_no_ct(
#     tgt_dat = condition_add(_, CMTRT == "BABY ASPIRIN"),
#     tgt_var = "CMGRPID",
#     raw_dat = cm_raw,
#     raw_var = "MDNUM"
#   )

# DOES NOT WORK EITHER (magrittr's pipe)
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>%
  assign_no_ct(
    tgt_dat = condition_add(., CMTRT == "BABY ASPIRIN"),
    tgt_var = "CMGRPID",
    raw_dat = cm_raw,
    raw_var = "MDNUM"
  )
#> Error in `admiraldev::assert_character_vector()` at sdtm.oak/R/assign.R:191:3:
#> ! `id_vars` must be a character vector but is a data frame

# WORKS (native pipe)
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  |>
  condition_add(CMTRT == "BABY ASPIRIN") |>
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID"
  )
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA

# WORKS (maggritr's pipe)
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>%
  condition_add(CMTRT == "BABY ASPIRIN") |>
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID"
  )
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA

# DOES NOT WORK (native pipe)
# assign_no_ct(raw_dat = cm_raw,
#              raw_var = "MDRAW",
#              tgt_var = "CMTRT")  |>
#   {
#     assign_no_ct(
#       tgt_dat = condition_add(_, CMTRT == "BABY ASPIRIN"),
#       raw_dat = cm_raw,
#       raw_var = "MDNUM",
#       tgt_var = "CMGRPID"
#     )
#   }

# WORKS (maggritr's pipe)
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>%
  {
  assign_no_ct(
    tgt_dat = condition_add(., CMTRT == "BABY ASPIRIN"),
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID"
  )
  }
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA

rammprasad commented 3 months ago

Thank you, @ramiromagno. Regarding the condition_add options, let's stick with options 1 and 2. We will add examples for both scenarios.

Move the condition_add() one level up and have it in the chain, i.e. not inline within the assign_no_ct()
Add braces around the assign_no_ct() call and use the placeholder where needed

rammprasad commented 3 months ago

Let me know once it is ready for the final review.

ramiromagno commented 3 months ago

Thank you, @ramiromagno. Regarding the condition_add options, let's stick with options 1 and 2. We will add examples for both scenarios.
1. Move the condition_add() one level up and have it in the chain, i.e. not inline within the assign_no_ct()

2. Add braces around the assign_no_ct() call and use the placeholder where needed

I know we decided to go with options 1 and 2. But given that Lionel sympathetically answered promptly, I am leaving it here for future record:

library(sdtm.oak)
library(magrittr)
library(dplyr, warn.conflicts = FALSE)

`%>>%` <- function(data, expr) {
  eval(substitute(expr), list(. = data), parent.frame())
}

study_ct <- read.csv(system.file("cm_domain/cm_sdtm_oak_ct.csv", package = "sdtm.oak"))
cm_raw <-  read.csv(system.file("cm_domain/cm_raw_data.csv", package = "sdtm.oak")) %>%
  #derive oak_id as the row number
  dplyr::mutate(
    oak_id = structure(seq_len(nrow(.))),
    patient_number = PATNUM,
    raw_source = "ConMed"
  ) %>%
  dplyr::select(oak_id_vars(), dplyr::everything())

assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>>%
  assign_no_ct(
    tgt_dat = condition_add(., CMTRT == "BABY ASPIRIN"),
    tgt_var = "CMGRPID",
    raw_dat = cm_raw,
    raw_var = "MDNUM"
  )
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA

^{Created on 2024-06-12 with reprex v2.1.0}

rammprasad commented 3 months ago

Shall we add this as the third option? This looks sleeker than using {}

ramiromagno commented 3 months ago

@rammprasad and @edgar-manukyan:

I think we are pretty close to having the code for conditioned data frames near completion.

A new pipe operator has been introduced %.>% check the docs for the details. This should allow to create chains of commands with less clutter, namely the usage of braces (%>% {...} can now be replaced simply with %.>% ...).
I've added more documentation across functions and examples.
I also added quite significant number of unit tests to most functions, but not all.
A new vignette has been created introducing conditioned data frames. It is currently incomplete because it has only one usage example with condition_add(). We should add those cases Ram mentioned where conditioning involves either the raw data set, the target, both independently, and both interdependently. For the most complicated case, both interdependently I reckon we will need to resort to using the sdtm_join() function explicitly by the user.

Take a look and give me your feedback!

edgar-manukyan commented 3 months ago

Thanks so much @ramiromagno 🙏 I will start the review shortly since I believe @rammprasad is happy with the MR and will approve it shortly. Let's refrain from adding any new features in this MR and open a new issue/MR instead.

edgar-manukyan commented 3 months ago

Simply brilliant @ramiromagno, thank you so much 🙏 🙏 🙏 for all your time and effort. I am sure the SDTM community is going to appreciate this. I also feel that admiral might grab your idea of conditioned data frames data as well 😉

Huge thanks for the tests 💯 💯 💯

rammprasad commented 3 months ago

It looks good to me. Lets merge this to main, and I can take care of the documentation updates.

ramiromagno commented 3 months ago

@rammprasad and @edgar-manukyan : please do not merge yet as I am now doing styling and linting fixes.

pharmaverse / sdtm.oak

if_then_else support via "conditioned" data frames #55

Question 2