Closed rossfarrugia closed 2 years ago
Reminder to use filter_pd from https://github.com/pharmaverse/admiralonco/issues/18 once available for the censor at first PD option
@sgorm123 , I replaced source_param
with filter_source
. I.e., instead of specifying the parameter code of the source parameter an arbitrary condition for selecting the source records can be specified. This allows to use other variables than ANL01FL
.
@sgorm123 , I replaced
source_param
withfilter_source
. I.e., instead of specifying the parameter code of the source parameter an arbitrary condition for selecting the source records can be specified. This allows to use other variables thanANL01FL
.
Thanks, makes sense. Though quick question @bundfussr @rossfarrugia: Would one also use this filter to remove any responses that are not standard, e.g. c("NA", "NE", "ND")? @ljin22
@bundfussr @rossfarrugia
Quick question: Why is dataset_adsl a suggested argument? I am not sure why it would be used for the creation of LSTA and LSTAC, given that source_pd can be used for determining first PD.
@sgorm123 , it is suggested because the new parameter should have an observation for each subject in ADSL even for those without any tumor assessment.
@sgorm123 , it is suggested because the new parameter should have an observation for each subject in ADSL even for those without any tumor assessment.
@bundfussr Thank you, I see. Is it also used to remove records in dataset that occur after a certain date in ADSL (e.g. TRTSDT)?
So far this was not intended. The idea was that such restriction is considered in the derivation of ANL01FL
or that the required date variables are in dataset
and you would use filter_source = PARAMCD == "OVR" & ADT>= TRTSDT
.
I wonder if we need a keep_adsl_vars
parameter which could be used to keep variables from dataset_adsl
for the new observations. Otherwise, it could happen that TRTSDT
is populated for PARAMCD == "OVR"
but not for the new parameter.
So far this was not intended. The idea was that such restriction is considered in the derivation of
ANL01FL
or that the required date variables are indataset
and you would usefilter_source = PARAMCD == "OVR" & ADT>= TRTSDT
.I wonder if we need a
keep_adsl_vars
parameter which could be used to keep variables fromdataset_adsl
for the new observations. Otherwise, it could happen thatTRTSDT
is populated forPARAMCD == "OVR"
but not for the new parameter.
Ah yes, ok, makes sense.
So keep_adsl_vars would be those columns that are used in the creation of the new PARAM? Correct? If so, then I agree, this makes sense.
Thank you.
@rossfarrugia , what do you think? Should we add the keep_adsl_vars
parameter to the admiralonco derive_param_*()
functions?
So far this was not intended. The idea was that such restriction is considered in the derivation of
ANL01FL
or that the required date variables are indataset
and you would usefilter_source = PARAMCD == "OVR" & ADT>= TRTSDT
. I wonder if we need akeep_adsl_vars
parameter which could be used to keep variables fromdataset_adsl
for the new observations. Otherwise, it could happen thatTRTSDT
is populated forPARAMCD == "OVR"
but not for the new parameter.Ah yes, ok, makes sense.
So keep_adsl_vars would be those columns that are used in the creation of the new PARAM? Correct? If so, then I agree, this makes sense.
Thank you.
@bundfussr Actually Stefan, when I look at the _derive_paramclinbenefit code there is an argument reference_date, which is required to be in ADSL and used to filter in the code ADT >= !!reference_date. Should we be consistent across the two functions? And I presume others. This is with reference to the ADRS workflow doc I have: reference date from ADSL so only checking assessments after this date
Maybe we can discuss on Wednesday, apologies for any misunderstanding.
@sgorm123 , good point. Yes, let's discuss on Wednesday.
@bundfussr @rossfarrugia Sorry, one more, what is the requirement for the argument source_datasets? I can not quite work out why it is needed here?
@sgorm123 , good point. Yes, let's discuss on Wednesday.
Thanks Stefan @ljin22
source_datasets
expects a named list of datasets. The date_source
object does not contain any data, dataset_name
is just a string. The source_datasets
parameter defines which data should be used in the derivation.
This allows to create the date_source()
object on a higher level without the data. For example
pd <- date_source(
dataset_name = “adrs”,
date = ADT,
filter = PARAMCD == “PD” & AVALC == “Y”
)
could be defined at company level. Then you would not need to repeat the definition in each study and datacut.
The same approach is used for other source object in admiral, e.g., event_source
or censor_source
as input of derive_param_tte()
.
@sgorm123 @bundfussr - will try and add my replies to the questions in the above thread which i don't think were answered... happy to discuss more tomorrow as needed.
"Would one also use this filter to remove any responses that are not standard, e.g. c("NA", "NE", "ND")?" - agree that could be a use case for filter_source
_"Should we add the keep_adsl_vars
parameter to the admiralonco derive_param_*()
functions?"_ - why would the ADSL vars from the source row not automatically be kept? We're definitely going to need things like the treatment group variables kept ready to set up for the TLGs. I wouldn't like the idea that a user had to list out every ADSL variable needed here as Step 2 of our ADRS workflow already has "Add ADSL required variables for subsequent derivations", so I would have expected them to stay populated for all newly created records automatically. I can't think of any scenario where I would want to see these suddenly made blank for the new parameter records.
_"when I look at the derive_param_clinbenefit code there is an argument reference_date
, which is required to be in ADSL and used to filter in the code ADT >= !!reference_date
. Should we be consistent across the two functions?"_ - The reason for that parameter there is the code goes beyond the simple ADT>=TRTSDT that can be covered in ANL01FL (as is relevant for all parameters) and it needs to "check for sustained period of time before PD, subject would need to have a valid non-PD assessment at least xxx days after ADSL reference date", so there is an end window applied in the derivation which isn't the case for taking the last assessment. But if you did need to add one then filter_source
could be used. That gives the user complete flexibility.
Thanks Ross, so filter_source is how the user is to filter records from the dataframe passed into the dataset argument (this can be up to the user). Wrt keep_adsl_vars, I agree, and defer to @bundfussr for further comment. Thank you.
@ljin22 FYI.
@rossfarrugia , @sgorm123 , regarding keep_adsl_vars
: We tried to keep ADSL variables automatically before but it did not work well. The idea was to keep variables which are constant within each subject. The issues were:
derive_param_*()
function it is not obvious which variables are kept.Therefore we decided to specify explicitly which variables should be kept for the new parameter. See the BDS Exposure vignette for an example. At the beginning the ADSL variable required for the derivations are stored in the adsl_vars
variable and later this variable is used in the calls for derived parameters.
I think for the ADRS functions we have two options:
keep_adsl_vars
parameter.dataset_adsl
are kept. I.e., if the users do not want to keep all ADSL variables, they need to create a temporary ADSL dataset with a subset of the variables. OK @bundfussr think i get it. So our Step 2 of ADRS template/vignette would be similar to the following from ADEX:
adsl_vars<-vars(TRTSDT, TRTSDTM, TRTEDT, TRTEDTM )
adex <- left_join(
ex,
select(adsl, STUDYID, USUBJID, !!!adsl_vars),
by = c("STUDYID", "USUBJID")
)
Then in all downstream parameter functions they need to be aware to only keep those vars set in adsl_vars
. But I assume then we don't need a new keep_adsl_vars
argument for this in each and every parameter derivation function right? As in your ADEX example, I don't see a keep_adsl_vars
argument in derive_param_exposure()
for example. It seems to be set once at the start of the code and then gets fed into everywhere downstream.
For ADEX adsl_vars
is specified for the by_vars
parameter, e.g., by_vars = vars(STUDYID, USUBJID, !!!adsl_vars)
. It is not used automatically. The programming strategy forbids this.
In the ADRS functions we do not have a by_vars
parameter. Therefore I proposed keep_adsl_vars
.
OK, thanks @bundfussr, let's go with your suggestion then, Can you update all the issues impacted and let the developers know again please?
I'll make sure the vignette and template shows how you can just set this list of variables once as done in ADEX, so users don't repeat these in many places.
Or on reflection @bundfussr could even use by_vars
here if that's more consistent with the way other admiral core parameter functions were designed? happy to go with your call though.
@rossfarrugia @bundfussr
Ive been pondering this a little overnight and wonder if this function (indeed others too) should just return the new SET_VALUES and AVAL/C, plus the original data (which would need to include all columns needed for the filter)? Joining with ADSL seems like it should be handled by the function caller, also as the admiral/admiralonco functions are to be piped (I presume) it could be a little confusing for the users as to when ADSL is joined. Further, I presume with allowing ADSL to be joined we also need to check for duplicate columns being returned from within the function.
I'll obviously go with whatever you decide, it is is just a thought, as Im just trying to get my head round the admiral design patterns.
@rossfarrugia @bundfussr @ljin22
What should the function do if there are two AVALs on the same last ADT? Take worse, error, concatenate the values, anything else...
@rossfarrugia @bundfussr
Is there any ADMIRAL convention to allow the dataframes from within the function to be returned (somehow) to the user for debugging? At Amgen we use an R6 Logger to print to the console at varying levels (which I see admiral does not) and also my convention (not Amgen) is to return a debug_list of all dataframes if the log level is set low enough (example below, of course in ADMIRAL without a Logger it would need to be an argument in the function call, e.g. debug_return = TRUE).
debug_list <- list()
debug_list$pd_data <- eval(rlang::parse_expr(source_pd$dataset_name)) %>% admiral::filter_if(source_pd$filter) %>% dplyr::select(!!!subject_keys, !!!by_vars, !!source_pd$date) %>% dplyr::rename(temp_pd_date = !!source_pd$date)
: :
if(Logger$LOG_LEVEL == 2){ db_debug_list <<- debug_list }
@sgorm123 some replies from me:
@bundfussr could probably comment more and on the last point too about debugging
@sgorm123 , I would issue an error in case the last observation is not unique. Then the user can update the filter_source
parameter to achieve uniqueness. Another option is that the functions provides an order
parameter, which is defaulted to ADT
. If necessary, the user could add other variables, e.g., vars(ADT, ASEQ)
or vars(ADT, AVALN)
.
If there are duplicate observation, admiral provides a dataset with the duplicate observations to the user. If you call filter_extreme()
with checktype
equals "warning"
or "error"
for selecting the last observation this is done automatically.
Thanks, I like the addition of an order param (I shall add) and I shall also look at filter_extreme().
wrt..
> * ADSL join happens prior to these param functions in the ADRS workflow so you're just choosing which of these variables to keep from the source records you feed in, e.g. PARAMCD == "OVR"
This makes sense, though can I then just confirm a comment above from @bundfussr:
@sgorm123 , it is suggested [The ADSL Argument] because the new parameter should have an observation for each subject in ADSL even for those without any tumor assessment.
Is this the case then, as this leads me to believe ADSL should be joined in the function? Perhaps we can just discuss and agree later.
Thanks for all the help, Stephen
Ah yes that makes sense @sgorm123 - you would need to add a record for those patients that might not have an input source record from RS. Even if all the analysis vars are all going to be null.
Ah yes that makes sense @sgorm123 - you would need to add a record for those patients that might not have an input source record from RS. Even if all the analysis vars are all going to be null.
For these subjects, would ADT, AVAL and AVALC be NULL?
Ah yes that makes sense @sgorm123 - you would need to add a record for those patients that might not have an input source record from RS. Even if all the analysis vars are all going to be null.
For these subjects, would ADT, AVAL and AVALC be NULL?
@bundfussr @rossfarrugia In the case that subjects in ADSL but with no post-baseline data or no evaluable assessment in ADRS, such records could cause conformance check issues of missing both AVAL and AVLAC in BDS structure data, and NULL ADT does not contribute in next ADTTE workflow. As ADaMs are "analysis-ready", should we just keep records with valid data?
Good point @ljin22 - i'm actually thinking we shouldn't create these blank records for this parameter as when I look at the ADTTE workflow we even have this covered already by:
censor_conditions=list(lasta_censor, **rand_censor**)
Also in our ADRS workflow we do say: All parameters are only created if patient has a record in ADSL.
I'll feed this back to the Roche standards team.
@sgorm123 as per chat yesterday i'll update this issue to remove dataset_adsl
argument which is no longer needed
@sgorm123 is this function ready? if so please do make a PR so we can get people starting on review
Feature Idea
derive_param_lasta: checks last disease assessment from OVR (where ANL01FL=”Y”) as default
Flexibility needed:
Details: checks last disease assessment from PARAMCD=”OVR” (where ANL01FL=”Y”) as default and sets new parameter record with AVAL/AVALC from the source record. Set ADT as the last date where the condition is fulfilled. Option to censor at first PD. Note: it is important that ANL01FL=”Y” provides only the eligible records to be considered for this derivation, as explained in the ADRS vignette.
Relevant Input
_Note: for source_pd a date_source() object is expected. See https://pharmaverse.github.io/admiral/reference/derive_param_tte.html as an example of how this works._
_Note: if sourcepd left NULL then not censored at first PD.
Relevant Output
New parameter: Covers LSTA and LSTAC from ADRS workflow doc.
Reproducible Example/Pseudo Code