Closed bundfussr closed 1 year ago
Can we have a look why this is needed? Please see my comment in https://github.com/pharmaverse/admiral/issues/856#issuecomment-1070863229
would be good if Thomas is there as well.
It is already on the agenda of the Monday meeting.
Please also see my comment in #856 regarding the intention of get_map()
.
Defining the examples above using function()
and case_when()
:
yn_order <- function(arg) {
assert_character_vector(arg)
case_when(
arg == "Y" ~ 1,
arg == "N" ~ 0,
TRUE ~ NA_numeric_
)
}
age_group <- function(arg) {
assert_numeric_vector(arg)
case_when(
arg >= 0 & arg <= 65 ~ "<=65",
arg > 65 ~ ">65",
TRUE ~ NA_numeric_
)
}
age_group2 <- function(
assert_numeric_vector(arg)
case_when(
arg >= 0 & arg < 65 ~ "<65",
arg >= 65 ~ ">=65",
TRUE ~ NA_numeric_
)
}
region_def <- function(arg) {
assert_character_vector(arg)
case_when(
arg %in% c("USA", "CAN") ~ "North America",
arg %in% c("CHE", "FRA", "NOR") ~ "Europe",
TRUE ~ NA_character_
)
}
I am wondering if this could be used in the creation of the ADLB grades? ie for simple criteria where grade is based on a value. For Anemia, we could have:
atoxgr <- get_map(
0 - ~80 := "Grade 3",
80 - ~100 := "Grade 2",
100 - ~ANRLO := "Grade 1"
)
And also, possibly for criteria like lab test ALT:
atoxgr <- get_map(
(ANRHI)~ - ~ (3 * ANRHI) := "Grade 1"
(3 * ANRHI) - ~ (5 * ANRHI) := "Grade 2",
(5 * ANRHI) - ~ (20 * ANRHI) := "Grade 3",
(20 * ANRHI) - Inf := "Grade 4"
)
Is this possible, with the functionality of get_map function?
Also, from the examples you posted above, its not clear to me the use of '~'?
> age_group <- get_map(
+ 0 - 65 := "<=65",
+ 65 - inf := ">65")
> age_group2 <- get_map(
+ 0 - ~65 := "<65",
+ 65 - inf := ">=65")
In the 2 cases above, we have 65 - inf
but one is ">65" and the other is ">=65". Is this just a typo, or is there some kind of rule when using the '~'.
@millerg23 , I would not use variables in the definition of the map as it would make the definition and usage less intuitive. For example the result of atoxgr(AVAL)
would depend on the context where it is called. It may result in an error if ANRLO
is not available. These type of functions I would define in the usual way with arguments for all required parameter, e.g., atoxgr(AVAL, ANRLO)
.
I used ~
to exclude the value, e.g., 0 - ~65
means greater or equal than zero and less than 65. Unfortunately, we can not use 0 - <65
.
We discussed it in the Questions and Comments meeting and decided to implement a first version in a separate package as it could be useful outside of ADaM creation.
Could this be transferred to the xportr package? https://github.com/atorus-research/xportr/blob/master/R/format.R
Currently needs to be specified in a spec file for the variable, but maybe we could make it single-use for a variable
@bms63 , I think it does not fit into the xportr package because the map class is also useful if you are not producing xpt files. The scope is different. The get_map()
functions creates an object similar to a SAS format but it does not assign it to a column of a dataset.
Sorry I did not read the issue thoroughly and misunderstood what this was for. Thought we were trying to create an actual SAS format .
Feels like a metatools issue to me.
I think it does not fit into metatools either because metatools provides functions using metacore objects as input. But get_map()
does not use a metacore object.
Ummm...we deprecated our suppqual function for metatools::comine_supp function which does not require a metacore object to use. Exception to the rule I know, but...
Also this just looks like metadata to me...so I'm thinking metatools.
I'm weary of so many packages for the users! :)
i agree with @bms63 - i'm not keen on an extra package for this. we could just include in specs and then use https://pharmaverse.github.io/metatools/reference/create_cat_var.html. or outside of that just use an extra case_when step. i don't believe this justifies adding another piece of complexity for users.
So my take on this is that this kind of functionality could be useful for programming ADaM datasets (especially for folks familiar with SAS) but also for more general "data wrangling" tasks. Thus, I feel this should neither live in {admiral} or {metatools} which both solve problem specifally related to ADaM datasets. I certainly think this would be worth its own little package, something line {formatsr} or so.
do we need to create that ourselves though under pharmaverse? e.g. see from a very quick google: https://www.pharmasug.org/proceedings/2021/AD/PharmaSUG-2021-AD-175.pdf & https://fmtr.r-sassy.org/index.html. i'm sure this kind of feature must exist in many packages and it's not a pharma-specific need.
I agree with Thomas that implementing it in its own package is more suitable.
The fmtr
package provide something similar but the definition of a format in fmtr
is not simpler than using case_when()
. My intention is
I would propose implementing a test version and see if it is useful (for example in admiralonco or admiralroche).
a test version makes sense and then show it to admiral core dev team to see what they think. could just use an admiral branch off devel to work up a test version, then it could move to its own package later if everyone agrees. i can't visualise the value at the moment that justifies this and surprised this wouldn't already exist, so i'd probably need to see more to help understand.
@tamarasenior , @reesnj , @MayaD99 , I had a short look at 1021-feature-request-implement-a-map-class-similar-to-sas-formats and wonder if it is clear what arguments the get_map()
function should provide. The function should not provide val
and val_fmt
but ...
such that calls like
age_group2 <- get_map(
0 - ~65 := "<65",
65 - inf := ">=65")
region_def <- get_map{
"USA", "CAN" := "North America",
"CHE", "FRA", "NOR" := "Europe")
work.
Hi @bundfussr,
Myself, @tamarasenior, @MayaD99 took a look into this issue as part of HackR this week. We reviewed the issue from a user perspective and attempted to test the concept you outlined above. We didn’t have time to fully replicate the function you requested but have created an initial function to outline how get_map() could work using character vectors, val and val_fmt as inputs - see here. Obviously, the scope of this function is limited since it would not be able to handle things like age ranges. Also present in this R file are some examples of how the fmtr package can be used as an alternative to this get_map() function.
If you think the effort would be justified, we are happy to continue exploring how a get_map() function could be created, which can handle the arguments as you specified - although we may need some more information on your example inputs above. For instance, is this := functionality coming from the data.table package? Regardless, we have discussed the advantageous and limitations of the get_map() function described above and our user feedback is as follows:
We hope you find this user feedback useful!
thanks HackR team for thoroughly exploring this one! with @thomas-neitmann / @bms63 permission I'd personally vote we close this one now and leave users to either use case_when or fmtr themselves (or over time someone could develop an improved package). i don't believe we need to solve this in admiral and we already have a growing list of pieces we need to focus on, especially when you consider the package extension efforts.
@reesnj , the idea of get_map()
is that you can define functions like age_group2
or region_def
just by specifying the left hand and right hand values. You would not need to think about how to translate the left hand values into conditions and if you should use case_when()
, if_else()
or something else. This is handled by get_map()
. Furthermore, you would not need to write unit tests or documentation for age_group2
or region_def
because the testing is covered by the unit tests of get_map()
and the documentation would be generated automatically from the objects age_group2
and region_def
.
get_map()
would also check the provided values, e.g., ensure that all the right hand values are of the same type and no value on the left hand side is repeated.
dplyr::case_match()
offers similar functionality (https://dplyr.tidyverse.org/reference/case_match.html).
Feature Idea
A
get_map()
function should be implemented which returns a function. This could be used to define mapping of character values to numeric values of grouping.Relevant Input
What should the input look like? REMINDER: no patient level data or company sensitive information should be shared via this open public issue
Relevant Output
What should the output look like? REMINDER: no patient level data or company sensitive information should be shared via this open public issue
Reproducible Example/Pseudo Code