pharmaverse / admiral

ADaM in R Asset Library
https://pharmaverse.github.io/admiral
Apache License 2.0
220 stars 61 forks source link

Feature Request: Implement a map class similar to SAS formats #1021

Closed bundfussr closed 1 year ago

bundfussr commented 2 years ago

Feature Idea

A get_map() function should be implemented which returns a function. This could be used to define mapping of character values to numeric values of grouping.

Relevant Input

What should the input look like? REMINDER: no patient level data or company sensitive information should be shared via this open public issue

Relevant Output

What should the output look like? REMINDER: no patient level data or company sensitive information should be shared via this open public issue

Reproducible Example/Pseudo Code


> yn_order <- get_map(
+  "Y" := 1,
+  "N" := 0)
> yn_order(c("N", "Y", "N"))
[1] 0 1 0
> yn_order
<map> object mapping character to numeric
Definition:
  "Y" := 1,
  "N" := 0
> age_group <- get_map(
+  0 - 65 := "<=65",
+  65 - inf := ">65")
> age_group2 <- get_map(
+  0 - ~65 := "<65",
+  65 - inf := ">=65")
> region_def <- get_map{
+  "USA", "CAN" := "North America",
+  "CHE", "FRA", "NOR" := "Europe")
koegerr commented 2 years ago

Can we have a look why this is needed? Please see my comment in https://github.com/pharmaverse/admiral/issues/856#issuecomment-1070863229

would be good if Thomas is there as well.

bundfussr commented 2 years ago

It is already on the agenda of the Monday meeting.

Please also see my comment in #856 regarding the intention of get_map().

bundfussr commented 2 years ago

Defining the examples above using function() and case_when():

yn_order <- function(arg) {
  assert_character_vector(arg)

  case_when(
    arg == "Y" ~ 1,
    arg == "N" ~ 0,
    TRUE ~ NA_numeric_
  )
}

age_group <- function(arg) {
  assert_numeric_vector(arg)

  case_when(
    arg >= 0 & arg <= 65 ~ "<=65",
    arg > 65 ~ ">65",
    TRUE ~ NA_numeric_
  )
}

age_group2 <- function(
  assert_numeric_vector(arg)

  case_when(
    arg >= 0 & arg < 65 ~ "<65",
    arg >= 65 ~ ">=65",
    TRUE ~ NA_numeric_
  )
}

region_def <- function(arg) {
  assert_character_vector(arg)

  case_when(
    arg %in% c("USA", "CAN") ~ "North America",
    arg %in% c("CHE", "FRA", "NOR") ~ "Europe",
    TRUE ~ NA_character_
  )
}
millerg23 commented 2 years ago

I am wondering if this could be used in the creation of the ADLB grades? ie for simple criteria where grade is based on a value. For Anemia, we could have:

atoxgr <- get_map(
   0 - ~80 := "Grade 3",
   80 - ~100 := "Grade 2",
 100 - ~ANRLO := "Grade 1"
)

And also, possibly for criteria like lab test ALT:

atoxgr <- get_map(
    (ANRHI)~ - ~ (3 * ANRHI)  := "Grade 1"
    (3 * ANRHI) - ~ (5 * ANRHI)  := "Grade 2",
     (5 * ANRHI) - ~ (20 * ANRHI)  := "Grade 3",
    (20 * ANRHI) - Inf := "Grade 4"
)

Is this possible, with the functionality of get_map function?

Also, from the examples you posted above, its not clear to me the use of '~'?

> age_group <- get_map(
+  0 - 65 := "<=65",
+  65 - inf := ">65")
> age_group2 <- get_map(
+  0 - ~65 := "<65",
+  65 - inf := ">=65")

In the 2 cases above, we have 65 - inf but one is ">65" and the other is ">=65". Is this just a typo, or is there some kind of rule when using the '~'.

bundfussr commented 2 years ago

@millerg23 , I would not use variables in the definition of the map as it would make the definition and usage less intuitive. For example the result of atoxgr(AVAL) would depend on the context where it is called. It may result in an error if ANRLO is not available. These type of functions I would define in the usual way with arguments for all required parameter, e.g., atoxgr(AVAL, ANRLO).

I used ~ to exclude the value, e.g., 0 - ~65 means greater or equal than zero and less than 65. Unfortunately, we can not use 0 - <65.

bundfussr commented 2 years ago

We discussed it in the Questions and Comments meeting and decided to implement a first version in a separate package as it could be useful outside of ADaM creation.

bms63 commented 2 years ago

Could this be transferred to the xportr package? https://github.com/atorus-research/xportr/blob/master/R/format.R

Currently needs to be specified in a spec file for the variable, but maybe we could make it single-use for a variable

bundfussr commented 2 years ago

@bms63 , I think it does not fit into the xportr package because the map class is also useful if you are not producing xpt files. The scope is different. The get_map() functions creates an object similar to a SAS format but it does not assign it to a column of a dataset.

bms63 commented 2 years ago

Sorry I did not read the issue thoroughly and misunderstood what this was for. Thought we were trying to create an actual SAS format .

Feels like a metatools issue to me.

bundfussr commented 2 years ago

I think it does not fit into metatools either because metatools provides functions using metacore objects as input. But get_map() does not use a metacore object.

bms63 commented 2 years ago

Ummm...we deprecated our suppqual function for metatools::comine_supp function which does not require a metacore object to use. Exception to the rule I know, but...

Also this just looks like metadata to me...so I'm thinking metatools.

I'm weary of so many packages for the users! :)

rossfarrugia commented 2 years ago

i agree with @bms63 - i'm not keen on an extra package for this. we could just include in specs and then use https://pharmaverse.github.io/metatools/reference/create_cat_var.html. or outside of that just use an extra case_when step. i don't believe this justifies adding another piece of complexity for users.

thomas-neitmann commented 2 years ago

So my take on this is that this kind of functionality could be useful for programming ADaM datasets (especially for folks familiar with SAS) but also for more general "data wrangling" tasks. Thus, I feel this should neither live in {admiral} or {metatools} which both solve problem specifally related to ADaM datasets. I certainly think this would be worth its own little package, something line {formatsr} or so.

rossfarrugia commented 2 years ago

do we need to create that ourselves though under pharmaverse? e.g. see from a very quick google: https://www.pharmasug.org/proceedings/2021/AD/PharmaSUG-2021-AD-175.pdf & https://fmtr.r-sassy.org/index.html. i'm sure this kind of feature must exist in many packages and it's not a pharma-specific need.

bundfussr commented 2 years ago

I agree with Thomas that implementing it in its own package is more suitable.

The fmtr package provide something similar but the definition of a format in fmtr is not simpler than using case_when(). My intention is

I would propose implementing a test version and see if it is useful (for example in admiralonco or admiralroche).

rossfarrugia commented 2 years ago

a test version makes sense and then show it to admiral core dev team to see what they think. could just use an admiral branch off devel to work up a test version, then it could move to its own package later if everyone agrees. i can't visualise the value at the moment that justifies this and surprised this wouldn't already exist, so i'd probably need to see more to help understand.

bundfussr commented 2 years ago

@tamarasenior , @reesnj , @MayaD99 , I had a short look at 1021-feature-request-implement-a-map-class-similar-to-sas-formats and wonder if it is clear what arguments the get_map() function should provide. The function should not provide val and val_fmt but ... such that calls like

age_group2 <- get_map(
  0 - ~65 := "<65",
  65 - inf := ">=65")
region_def <- get_map{
  "USA", "CAN" := "North America",
  "CHE", "FRA", "NOR" := "Europe")

work.

reesnj commented 2 years ago

Hi @bundfussr,

Myself, @tamarasenior, @MayaD99 took a look into this issue as part of HackR this week. We reviewed the issue from a user perspective and attempted to test the concept you outlined above. We didn’t have time to fully replicate the function you requested but have created an initial function to outline how get_map() could work using character vectors, val and val_fmt as inputs - see here. Obviously, the scope of this function is limited since it would not be able to handle things like age ranges. Also present in this R file are some examples of how the fmtr package can be used as an alternative to this get_map() function.

If you think the effort would be justified, we are happy to continue exploring how a get_map() function could be created, which can handle the arguments as you specified - although we may need some more information on your example inputs above. For instance, is this := functionality coming from the data.table package? Regardless, we have discussed the advantageous and limitations of the get_map() function described above and our user feedback is as follows:

  1. We can’t see how using a function like get_map() would be more intuitive than a case_when() statement. Further, when inevitably encountering issues/errors, as users, we would find it much easier to debug a simple case_when() statement, rather than a lesser known function such as get_map(), which could rely on some functionality from other packages that we are less familiar with (:= from data.table for example).
  2. We also tested the fmtr package and found this to be quite intuitive (example here). As you can see, this package can provide a solution for all the examples you provided above and is easy to pick up, even for a beginner.

We hope you find this user feedback useful!

rossfarrugia commented 2 years ago

thanks HackR team for thoroughly exploring this one! with @thomas-neitmann / @bms63 permission I'd personally vote we close this one now and leave users to either use case_when or fmtr themselves (or over time someone could develop an improved package). i don't believe we need to solve this in admiral and we already have a growing list of pieces we need to focus on, especially when you consider the package extension efforts.

bundfussr commented 2 years ago

@reesnj , the idea of get_map() is that you can define functions like age_group2 or region_def just by specifying the left hand and right hand values. You would not need to think about how to translate the left hand values into conditions and if you should use case_when(), if_else() or something else. This is handled by get_map(). Furthermore, you would not need to write unit tests or documentation for age_group2 or region_def because the testing is covered by the unit tests of get_map() and the documentation would be generated automatically from the objects age_group2 and region_def. get_map() would also check the provided values, e.g., ensure that all the right hand values are of the same type and no value on the left hand side is repeated.

bundfussr commented 1 year ago

dplyr::case_match() offers similar functionality (https://dplyr.tidyverse.org/reference/case_match.html).