pharmaverse / admiralonco

Oncology extension package for ADaM in R Asset Library (admiral)
https://pharmaverse.github.io/admiralonco/index.html
Apache License 2.0
30 stars 8 forks source link

New admiralonco function: derive_param_lasta #25

Closed rossfarrugia closed 2 years ago

rossfarrugia commented 2 years ago

Feature Idea

derive_param_lasta: checks last disease assessment from OVR (where ANL01FL=”Y”) as default

Flexibility needed:

Details: checks last disease assessment from PARAMCD=”OVR” (where ANL01FL=”Y”) as default and sets new parameter record with AVAL/AVALC from the source record. Set ADT as the last date where the condition is fulfilled. Option to censor at first PD. Note: it is important that ANL01FL=”Y” provides only the eligible records to be considered for this derivation, as explained in the ADRS vignette.

Relevant Input

derive_param_lasta(
  dataset,
  filter_source,
  source_pd,
  source_datasets,
  set_values_to,
  subject_keys = vars(STUDYID, USUBJID)
)

_Note: for source_pd a date_source() object is expected. See https://pharmaverse.github.io/admiral/reference/derive_param_tte.html as an example of how this works._

_Note: if sourcepd left NULL then not censored at first PD.

Relevant Output

New parameter: Covers LSTA and LSTAC from ADRS workflow doc.

Reproducible Example/Pseudo Code

pd <- date_source(
  dataset_name = “adrs”,
  date = ADT,
  filter = PARAMCD == “PD” & AVALC == “Y”
)

derive_param_lasta(
  dataset,
  filter_source = PARAMCD == "OVR" & ANL01FL == "Y",
  source_pd = list(pd),
  source_datasets = list(adrs = adrs),
  set_values_to = vars(
    PARAMCD = "LSTAC",
    PARAM = " Last Disease Assessment Censored at First PD by Investigator",
    PARCAT1 = "Tumor Response",
    PARCAT2 = "Investigator",
    PARCAT3 = "Recist 1.1",
    ANL01FL = "Y")
)
rossfarrugia commented 2 years ago

Reminder to use filter_pd from https://github.com/pharmaverse/admiralonco/issues/18 once available for the censor at first PD option

bundfussr commented 2 years ago

@sgorm123 , I replaced source_param with filter_source. I.e., instead of specifying the parameter code of the source parameter an arbitrary condition for selecting the source records can be specified. This allows to use other variables than ANL01FL.

sgorm123 commented 2 years ago

@sgorm123 , I replaced source_param with filter_source. I.e., instead of specifying the parameter code of the source parameter an arbitrary condition for selecting the source records can be specified. This allows to use other variables than ANL01FL.

Thanks, makes sense. Though quick question @bundfussr @rossfarrugia: Would one also use this filter to remove any responses that are not standard, e.g. c("NA", "NE", "ND")? @ljin22

sgorm123 commented 2 years ago

@bundfussr @rossfarrugia

Quick question: Why is dataset_adsl a suggested argument? I am not sure why it would be used for the creation of LSTA and LSTAC, given that source_pd can be used for determining first PD.

bundfussr commented 2 years ago

@sgorm123 , it is suggested because the new parameter should have an observation for each subject in ADSL even for those without any tumor assessment.

sgorm123 commented 2 years ago

@sgorm123 , it is suggested because the new parameter should have an observation for each subject in ADSL even for those without any tumor assessment.

@bundfussr Thank you, I see. Is it also used to remove records in dataset that occur after a certain date in ADSL (e.g. TRTSDT)?

bundfussr commented 2 years ago

So far this was not intended. The idea was that such restriction is considered in the derivation of ANL01FL or that the required date variables are in dataset and you would use filter_source = PARAMCD == "OVR" & ADT>= TRTSDT.

I wonder if we need a keep_adsl_vars parameter which could be used to keep variables from dataset_adsl for the new observations. Otherwise, it could happen that TRTSDT is populated for PARAMCD == "OVR" but not for the new parameter.

sgorm123 commented 2 years ago

So far this was not intended. The idea was that such restriction is considered in the derivation of ANL01FL or that the required date variables are in dataset and you would use filter_source = PARAMCD == "OVR" & ADT>= TRTSDT.

I wonder if we need a keep_adsl_vars parameter which could be used to keep variables from dataset_adsl for the new observations. Otherwise, it could happen that TRTSDT is populated for PARAMCD == "OVR" but not for the new parameter.

Ah yes, ok, makes sense.

So keep_adsl_vars would be those columns that are used in the creation of the new PARAM? Correct? If so, then I agree, this makes sense.

Thank you.

bundfussr commented 2 years ago

@rossfarrugia , what do you think? Should we add the keep_adsl_vars parameter to the admiralonco derive_param_*() functions?

sgorm123 commented 2 years ago

So far this was not intended. The idea was that such restriction is considered in the derivation of ANL01FL or that the required date variables are in dataset and you would use filter_source = PARAMCD == "OVR" & ADT>= TRTSDT. I wonder if we need a keep_adsl_vars parameter which could be used to keep variables from dataset_adsl for the new observations. Otherwise, it could happen that TRTSDT is populated for PARAMCD == "OVR" but not for the new parameter.

Ah yes, ok, makes sense.

So keep_adsl_vars would be those columns that are used in the creation of the new PARAM? Correct? If so, then I agree, this makes sense.

Thank you.

@bundfussr Actually Stefan, when I look at the _derive_paramclinbenefit code there is an argument reference_date, which is required to be in ADSL and used to filter in the code ADT >= !!reference_date. Should we be consistent across the two functions? And I presume others. This is with reference to the ADRS workflow doc I have: reference date from ADSL so only checking assessments after this date

Maybe we can discuss on Wednesday, apologies for any misunderstanding.

bundfussr commented 2 years ago

@sgorm123 , good point. Yes, let's discuss on Wednesday.

sgorm123 commented 2 years ago

@bundfussr @rossfarrugia Sorry, one more, what is the requirement for the argument source_datasets? I can not quite work out why it is needed here?

sgorm123 commented 2 years ago

@sgorm123 , good point. Yes, let's discuss on Wednesday.

Thanks Stefan @ljin22

bundfussr commented 2 years ago

source_datasets expects a named list of datasets. The date_source object does not contain any data, dataset_name is just a string. The source_datasets parameter defines which data should be used in the derivation.

This allows to create the date_source() object on a higher level without the data. For example

pd <- date_source(
  dataset_name = “adrs”,
  date = ADT,
  filter = PARAMCD == “PD” & AVALC == “Y”
)

could be defined at company level. Then you would not need to repeat the definition in each study and datacut.

The same approach is used for other source object in admiral, e.g., event_source or censor_source as input of derive_param_tte().

rossfarrugia commented 2 years ago

@sgorm123 @bundfussr - will try and add my replies to the questions in the above thread which i don't think were answered... happy to discuss more tomorrow as needed.

sgorm123 commented 2 years ago

Thanks Ross, so filter_source is how the user is to filter records from the dataframe passed into the dataset argument (this can be up to the user). Wrt keep_adsl_vars, I agree, and defer to @bundfussr for further comment. Thank you.

@ljin22 FYI.

bundfussr commented 2 years ago

@rossfarrugia , @sgorm123 , regarding keep_adsl_vars: We tried to keep ADSL variables automatically before but it did not work well. The idea was to keep variables which are constant within each subject. The issues were:

Therefore we decided to specify explicitly which variables should be kept for the new parameter. See the BDS Exposure vignette for an example. At the beginning the ADSL variable required for the derivations are stored in the adsl_vars variable and later this variable is used in the calls for derived parameters.

I think for the ADRS functions we have two options:

rossfarrugia commented 2 years ago

OK @bundfussr think i get it. So our Step 2 of ADRS template/vignette would be similar to the following from ADEX:

adsl_vars<-vars(TRTSDT, TRTSDTM, TRTEDT, TRTEDTM )

adex <- left_join(
  ex,
  select(adsl, STUDYID, USUBJID, !!!adsl_vars),
  by = c("STUDYID", "USUBJID")
)

Then in all downstream parameter functions they need to be aware to only keep those vars set in adsl_vars. But I assume then we don't need a new keep_adsl_vars argument for this in each and every parameter derivation function right? As in your ADEX example, I don't see a keep_adsl_vars argument in derive_param_exposure() for example. It seems to be set once at the start of the code and then gets fed into everywhere downstream.

bundfussr commented 2 years ago

For ADEX adsl_vars is specified for the by_vars parameter, e.g., by_vars = vars(STUDYID, USUBJID, !!!adsl_vars). It is not used automatically. The programming strategy forbids this.

In the ADRS functions we do not have a by_vars parameter. Therefore I proposed keep_adsl_vars.

rossfarrugia commented 2 years ago

OK, thanks @bundfussr, let's go with your suggestion then, Can you update all the issues impacted and let the developers know again please?

I'll make sure the vignette and template shows how you can just set this list of variables once as done in ADEX, so users don't repeat these in many places.

rossfarrugia commented 2 years ago

Or on reflection @bundfussr could even use by_vars here if that's more consistent with the way other admiral core parameter functions were designed? happy to go with your call though.

sgorm123 commented 2 years ago

@rossfarrugia @bundfussr

Ive been pondering this a little overnight and wonder if this function (indeed others too) should just return the new SET_VALUES and AVAL/C, plus the original data (which would need to include all columns needed for the filter)? Joining with ADSL seems like it should be handled by the function caller, also as the admiral/admiralonco functions are to be piped (I presume) it could be a little confusing for the users as to when ADSL is joined. Further, I presume with allowing ADSL to be joined we also need to check for duplicate columns being returned from within the function.

I'll obviously go with whatever you decide, it is is just a thought, as Im just trying to get my head round the admiral design patterns.

sgorm123 commented 2 years ago

@rossfarrugia @bundfussr @ljin22

What should the function do if there are two AVALs on the same last ADT? Take worse, error, concatenate the values, anything else...

sgorm123 commented 2 years ago

@rossfarrugia @bundfussr

Is there any ADMIRAL convention to allow the dataframes from within the function to be returned (somehow) to the user for debugging? At Amgen we use an R6 Logger to print to the console at varying levels (which I see admiral does not) and also my convention (not Amgen) is to return a debug_list of all dataframes if the log level is set low enough (example below, of course in ADMIRAL without a Logger it would need to be an argument in the function call, e.g. debug_return = TRUE).

debug_list <- list()

debug_list$pd_data <- eval(rlang::parse_expr(source_pd$dataset_name)) %>% admiral::filter_if(source_pd$filter) %>% dplyr::select(!!!subject_keys, !!!by_vars, !!source_pd$date) %>% dplyr::rename(temp_pd_date = !!source_pd$date)

: :

if(Logger$LOG_LEVEL == 2){ db_debug_list <<- debug_list }

rossfarrugia commented 2 years ago

@sgorm123 some replies from me:

@bundfussr could probably comment more and on the last point too about debugging

bundfussr commented 2 years ago

@sgorm123 , I would issue an error in case the last observation is not unique. Then the user can update the filter_source parameter to achieve uniqueness. Another option is that the functions provides an order parameter, which is defaulted to ADT. If necessary, the user could add other variables, e.g., vars(ADT, ASEQ) or vars(ADT, AVALN).

If there are duplicate observation, admiral provides a dataset with the duplicate observations to the user. If you call filter_extreme() with checktype equals "warning" or "error" for selecting the last observation this is done automatically.

sgorm123 commented 2 years ago

Thanks, I like the addition of an order param (I shall add) and I shall also look at filter_extreme().

wrt..

> * ADSL join happens prior to these param functions in the ADRS workflow so you're just choosing which of these variables to keep from the source records you feed in, e.g. PARAMCD == "OVR"

This makes sense, though can I then just confirm a comment above from @bundfussr:

@sgorm123 , it is suggested [The ADSL Argument] because the new parameter should have an observation for each subject in ADSL even for those without any tumor assessment.

Is this the case then, as this leads me to believe ADSL should be joined in the function? Perhaps we can just discuss and agree later.

Thanks for all the help, Stephen

rossfarrugia commented 2 years ago

Ah yes that makes sense @sgorm123 - you would need to add a record for those patients that might not have an input source record from RS. Even if all the analysis vars are all going to be null.

sgorm123 commented 2 years ago

Ah yes that makes sense @sgorm123 - you would need to add a record for those patients that might not have an input source record from RS. Even if all the analysis vars are all going to be null.

For these subjects, would ADT, AVAL and AVALC be NULL?

ljin22 commented 2 years ago

Ah yes that makes sense @sgorm123 - you would need to add a record for those patients that might not have an input source record from RS. Even if all the analysis vars are all going to be null.

For these subjects, would ADT, AVAL and AVALC be NULL?

@bundfussr @rossfarrugia In the case that subjects in ADSL but with no post-baseline data or no evaluable assessment in ADRS, such records could cause conformance check issues of missing both AVAL and AVLAC in BDS structure data, and NULL ADT does not contribute in next ADTTE workflow. As ADaMs are "analysis-ready", should we just keep records with valid data?

rossfarrugia commented 2 years ago

Good point @ljin22 - i'm actually thinking we shouldn't create these blank records for this parameter as when I look at the ADTTE workflow we even have this covered already by: censor_conditions=list(lasta_censor, **rand_censor**)

Also in our ADRS workflow we do say: All parameters are only created if patient has a record in ADSL.

I'll feed this back to the Roche standards team.

rossfarrugia commented 2 years ago

@sgorm123 as per chat yesterday i'll update this issue to remove dataset_adsl argument which is no longer needed

rossfarrugia commented 2 years ago

@sgorm123 is this function ready? if so please do make a PR so we can get people starting on review