pharmaverse / admiral

ADaM in R Asset Library
https://pharmaverse.github.io/admiral
Apache License 2.0
220 stars 61 forks source link

Feature Request: Function for creating variable pairs like `MCRITyML`/`MCRITyMN` #2480

Open bundfussr opened 1 month ago

bundfussr commented 1 month ago

Feature Idea

Many CDISC variable are pairs of character and numeric variables like MCRITyML/MCRITyMN, AVALCATy/AVALCAyN, AGEGRy/AGEGRyN. Usually the values depend on a condition.

At the moment the two variables are derived in two steps, which separates the condition, the character value, and the numeric value. For example

avalcat_lookup <- tibble::tribble(
  ~PARAMCD, ~AVALCA1N, ~AVALCAT1,
  "HEIGHT",         1, ">140 cm",
  "HEIGHT",         2, "<= 140 cm"
)

format_avalcat1n <- function(param, aval) {
  case_when(
    param == "HEIGHT" & aval > 140 ~ 1,
    param == "HEIGHT" & aval <= 140 ~ 2
  )
}

advs <- advs %>%
  mutate(AVALCA1N = format_avalcat1n(param = PARAMCD, aval = AVAL)) %>%
  derive_vars_merged(
    avalcat_lookup,
    by = exprs(PARAMCD, AVALCA1N)
  )

I would propose to implement a function where the condition, the character value, and the numeric value are specified in a tribble()-like way:

advs <- advs %>%
 derive_vars_cat(definition = exprs(
    ~condition,  ~AVALCAT1,  ~AVALCA1N,
    AVAL > 140,  ">140 cm",          1,
    AVAL <= 140, "<=140 cm",         2
 )  
)

I think it is easier to implement and review such variable this way.

Relevant Input

No response

Relevant Output

No response

Reproducible Example/Pseudo Code

No response

bms63 commented 1 month ago

oooohhh i like this!! @pharmaverse/admiral @pharmaverse/admiral_comm what do you think!?!

I think this would be a fun function to spin up, but would take some work to finish it out!! who wants it? :)

kaz462 commented 1 month ago

The definition data @bundfussr mentioned above is similar to codelist in metadata, create_cat_var and create_var_from_codelist from metatools can create paired variables dependent on good quality of metadata/codelist.

Should we consider moving this issue to metatools? e.g., enhance metacore argument in create_cat_var(data, metacore, ...) so that it allows metadata that are not restricted by the metacore object. However, I'm not sure if this is necessary. what do you all think?

manciniedoardo commented 1 month ago

The definition data @bundfussr mentioned above is similar to codelist in metadata, create_cat_var and create_var_from_codelist from metatools can create paired variables dependent on good quality of metadata/codelist.

Should we consider moving this issue to metatools? e.g., enhance metacore argument in create_cat_var(data, metacore, ...) so that it allows metadata that are not restricted by the metacore object. However, I'm not sure if this is necessary. what do you all think?

Worth noting that none of the admiral packages depend on metatools or metacore, and generally we have strayed away from adding that dependency. So I wouldn't add to metatools.

bms63 commented 1 month ago

The definition data @bundfussr mentioned above is similar to codelist in metadata, create_cat_var and create_var_from_codelist from metatools can create paired variables dependent on good quality of metadata/codelist. Should we consider moving this issue to metatools? e.g., enhance metacore argument in create_cat_var(data, metacore, ...) so that it allows metadata that are not restricted by the metacore object. However, I'm not sure if this is necessary. what do you all think?

Worth noting that none of the admiral packages depend on metatools or metacore, and generally we have strayed away from adding that dependency. So I wouldn't add to metatools.

Should we do both? - enhance metatools/metacore as well as build a function for it in admiral. My metadata is never amazing in the beginning of ADaM creation

bundfussr commented 1 month ago

Should we consider moving this issue to metatools? e.g., enhance metacore argument in create_cat_var(data, metacore, ...) so that it allows metadata that are not restricted by the metacore object. However, I'm not sure if this is necessary. what do you all think?

I'm not sure. In create_cat_var() the condition for the categories is not explicitly specified but guessed from the character value of the category. I.e., it doesn't work for categories like "high", "low", "normal". If we would add the condition to the metacore object, the specs wouldn't be language agnostic.

kaz462 commented 1 month ago

Thanks @manciniedoardo @bms63 @bundfussr for your inputs.

Worth noting that none of the admiral packages depend on metatools or metacore, and generally we have strayed away from adding that dependency. So I wouldn't add to metatools.

@manciniedoardo We could guide users to metatools for deriving paired variables, instead of adding dependencies.

In create_cat_var() the condition for the categories is not explicitly specified but guessed from the character value of the category. I.e., it doesn't work for categories like "high", "low", "normal". If we would add the condition to the metacore object, the specs wouldn't be language agnostic.

@bundfussr Would it be okay to use only the character category without adding the condition? e.g. the current create_cat_var/create_subgrps can derive AGEGR1 based on age using the codelist data below. Could you help explain why it does not work for categories like 'high', 'low', and 'normal'?

codelist <- tribble(
   ~AGEGR1, ~AGEGR1N,
   "<65",   1,
   "65-80", 2,
   ">80",   3
)

If we enhance the metacore argument from create_cat_var to allow non-metacore object, we can use the same function for creating variable pairs -

create_cat_var(data, codelist/metacore, ref_var, grp_var, num_grp_var = NULL)

I personally prefer the current way, as it relies on metadata rather than hardcoding in the program, which can improve consistency between the program and metadata. But if the new feature is useful for metadata-independent cases, I don’t have a strong preference between building a new function in admiral, enhancing create_cat_var, or even both :)

bundfussr commented 1 month ago

@bundfussr Would it be okay to use only the character category without adding the condition? e.g. the current create_cat_var/create_subgrps can derive AGEGR1 based on age using the codelist data below. Could you help explain why it does not work for categories like 'high', 'low', and 'normal'?

If we have a codelist like the following, create_cat_var() can't derive AVALCAT1 from AVAL without the conditions which define the categories.

codelist <- tribble(
   ~AVALCAT1, ~AVALCA1N,
   "low",     1,
   "normal",  2,
   "high",    3
)

I think it depends on the available metadata which way is the best to derive such variables. I would try to avoid duplication of information. If code/decode codelists are available in the metadata, I would specify the conditions and categories in the ADaM script (using case_when() or case_match()) and then use the metadata to map the categories to the coded values. If no code/decode codelists are available (like in the Roche standard specs), I would use the suggested function.

We could extend create_cat_var(). However, it can be used in special cases only. It breaks already if "<65 years" is used instead of "<65". Therefore I would use the more general function derive_vars_cat() for all categorization variables.

StefanThoma commented 2 weeks ago

I'd be happy to take a shot at this, once we have decided what exactly we want to do.

bms63 commented 2 weeks ago

HI Stefan - reviewing the discussion it looks like @bundfussr original proposal was agreed upon. We can discuss at next week's meeting if you would like?

StefanThoma commented 1 week ago

Yeah that sounds good!

On Thu, Aug 29, 2024 at 4:12 PM Ben Straub @.***> wrote:

HI Stefan - reviewing the discussion it looks like @bundfussr https://github.com/bundfussr original proposal was agreed upon. We can discuss at next week's meeting if you would like?

— Reply to this email directly, view it on GitHub https://github.com/pharmaverse/admiral/issues/2480#issuecomment-2317794497, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJUWWEVGTLBEYPRBOUGA3GLZT4T45AVCNFSM6AAAAABLA4KDC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJXG44TINBZG4 . You are receiving this because you are on a team that was mentioned.Message ID: @.***>