pharmaverse / sdtm.oak

An EDC and Data Standard agnostic SDTM data transformation engine that automates the transformation of raw clinical data in ODM format to SDTM based on standard mapping algorithms
https://pharmaverse.github.io/sdtm.oak/
Apache License 2.0
22 stars 6 forks source link

Feature Request: `assign_datetime` algorithm #46

Closed rammprasad closed 2 months ago

rammprasad commented 3 months ago

Feature Idea

The assign_datetime algorithm will be implemented as a function. As referred in the documentation, this will be used to assign a value.

Algorithm Description - One-to-one mapping of the raw date and/or time source to a target SDTM variable in iso8601 format

Example mappings - CM.CMSTDTC AE.AEENDTC

function call

assign_datetime(
    raw_dat,
    raw_var,
    raw_fmt,
    raw_unk,
    tgt_var,
    tgt_dat,
    id_vars
 )

Input: raw_dat The raw dataset.

raw_var The raw variable.

raw_fmt - Format of the raw variable

raw_unk - Format of the unknown date and time components.

tgt_var The target SDTM variable.

tgt_dat - Optional parameter. This is the target_dataset that was created in the previous step.

id_vars - Optional parameter. A vector with the string that will be used to merge to the target_dataset

Output: A dataframe with oak_id_vars and target_sdtm_variable if tgt_dat & id_vars are not provided tgt_dat with one additional variable target_sdtm_variable

Relevant Input

sdtm spec

study_number raw_source_model raw_dataset raw_dataset_ordinal raw_dataset_label raw_variable raw_variable_label raw_variable_ordinal raw_variable_type raw_data_format raw_codelist study_specific annotation_ordinal mapping_is_dataset annotation_text target_sdtm_domain target_sdtm_variable target_sdtm_variable_role target_sdtm_variable_codelist_code target_sdtm_variable_controlled_terms_or_format target_sdtm_variable_ordinal origin mapping_algorithm entity_sub_algorithm target_hardcoded_value target_term_value target_term_code condition_ordinal condition_group_ordinal condition_left_raw_dataset condition_left_raw_variable condition_left_sdtm_domain condition_left_sdtm_variable condition_operator condition_right_text_value condition_right_sdtm_domain condition_right_sdtm_variable condition_right_raw_dataset condition_right_raw_variable condition_next_logical_operator merge_type merge_left merge_right merge_condition unduplicate_keys groupby_keys target_resource_raw_dataset target_resource_raw_variable
lp_study e-CRF MD1 27 Concomitant Medications MDBDR Start date 5 DateTime dd- MMM- yyyy NA FALSE 1 FALSE CM.CMSTDTC CM CMSTDTC Timing Variable NA ISO 8601 39 CRF ASSIGN_NO_CT NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
lp_study e-CRF MD1 27 Concomitant Medications MDEDR End date 10 DateTime dd- MMM- yyyy NA FALSE 1 FALSE CM.CMENDTC CM CMENDTC Timing Variable NA ISO 8601 40 CRF ASSIGN_NO_CT NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
lp_study e-CRF MD1 27 Concomitant Medications MDETM End time 11 DateTime HH nn NA FALSE 1 FALSE CM.CMENDTC CM CMENDTC Timing Variable NA ISO 8601 40 CRF ASSIGN_NO_CT NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
raw_datasaet = MD1 oak_id raw_source patient_number MDBDR MDEDR MDETM
1 MD1 375 NA NA NA
2 MD1 375 15-Sep-20 NA NA
3 MD1 376 17-Feb-21 17-Feb-21 NA
4 MD1 377 4-Oct-20 NA NA
5 MD1 377 20-Jan-20 20-Jan-20 10:00:00
6 MD1 377 UN-UNK-2019 UN-UNK-2019 NA
7 MD1 377 20-UNK-2019 20-UNK-2019 NA
8 MD1 378 UN-UNK-2020 UN-UNK-2020 NA
9 MD1 378 26-Jan-20 26-Jan-20 07:00:00
10 MD1 378 28-Jan-20 1-Feb-20 NA
11 MD1 378 12-Feb-20 18-Feb-20 NA
12 MD1 379 10-UNK-2020 20-UNK-2020 NA
13 MD1 379 NA NA NA
14 MD1 379 NA 17-Feb-20 NA

raw_var = "MDBDR"

raw_fmt = "d-m-y"

raw_unk = c("UN", "UNK")

tar_var = "CMSTDTC"

tar_dat = cm_inter - Let's assume CMTRT, CMINDC variables are already derived and CMSTDTC is the third variable being processed

oak_id raw_source patient_number CMTRT CMINDC
1 MD1 375 BABY ASPIRIN NA
2 MD1 375 CORTISPORIN NAUSEA
3 MD1 376 ASPIRIN ANEMIA
4 MD1 377 DIPHENHYDRAMINE HCL NAUSEA
5 MD1 377 PARCETEMOL PYREXIA
6 MD1 377 VOMIKIND VOMITINGS
7 MD1 377 ZENFLOX OZ DIARHHEA
8 MD1 378 AMITRYPTYLINE COLD
9 MD1 378 BENADRYL FEVER
10 MD1 378 DIPHENHYDRAMINE HYDROCHLORIDE LEG PAIN
11 MD1 378 TETRACYCLINE FEVER
12 MD1 379 BENADRYL COLD
13 MD1 379 SOMINEX COLD
14 MD1 379 ZQUILL PAIN

id_vars - oak_id_vars()

Relevant Output

Option 1 - When used without merde and id_vars. The function call is

assign_datetime(
raw_dat = MD1
raw_var = "MDBDR",
raw_fmt = "d-m-y",
raw_unk = c("UN", "UNK"),
tar_var = "CMSTDTC")

output dataset from the function

oak_id raw_source patient_number CMSTDTC
1 MD1 375 NA
2 MD1 375 2020-09-15
3 MD1 376 2021-02-17
4 MD1 377 2020-10-04
5 MD1 377 2020-01-20
6 MD1 377 2019
7 MD1 377 2019---20
8 MD1 378 2020
9 MD1 378 2020-01-26
10 MD1 378 2020-01-28
11 MD1 378 2020-02-12
12 MD1 379 2020---10
13 MD1 379 NA
14 MD1 379 NA

Option 2 - When used with merging and id_vars

assign_datetime(
raw_dat = MD1
raw_var = "MDBDR",
raw_fmt = "d-m-y",
raw_unk = c("UN", "UNK"),
tar_var = "CMSTDTC"),
tar_dat = cm_inter,
id_vars = oak_id_vars())

Output dataset

oak_id raw_source patient_number CMTRT CMINDC CMSTDTC
1 MD1 375 BABY ASPIRIN NA NA
2 MD1 375 CORTISPORIN NAUSEA 2020-09-15
3 MD1 376 ASPIRIN ANEMIA 2021-02-17
4 MD1 377 DIPHENHYDRAMINE HCL NAUSEA 2020-10-04
5 MD1 377 PARCETEMOL PYREXIA 2020-01-20
6 MD1 377 VOMIKIND VOMITINGS 2019
7 MD1 377 ZENFLOX OZ DIARHHEA 2019---20
8 MD1 378 AMITRYPTYLINE COLD 2020
9 MD1 378 BENADRYL FEVER 2020-01-26
10 MD1 378 DIPHENHYDRAMINE HYDROCHLORIDE LEG PAIN 2020-01-28
11 MD1 378 TETRACYCLINE FEVER 2020-02-12
12 MD1 379 BENADRYL COLD 2020---10
13 MD1 379 SOMINEX COLD NA
14 MD1 379 ZQUILL PAIN NA

Reproducible Example/Pseudo Code

library(sdtm.oak)
library(dplyr)

#Option 1 - Just to derive CMSTDTC

assign_datetime(
raw_dat = MD1
raw_var = "MDBDR",
raw_fmt = "d-m-y",
raw_unk = c("UN", "UNK"),
tar_var = "CMSTDTC")

#Option 2 - Derive cmstdtc along and merge it

cm <- cm_daw_data |>
  # Derive topic variable
  assign_no_ct(
    raw_dataset = MD1, 
    raw_variable = MDRAW,
    target_sdtm_var = CMTRT
  )  |>
  assign_no_ct(
    raw_dataset = MD1,
    raw_variable = MDIND,
    target_sdtm_var = CMINDC,
    merge_to_topic_by = c("oak_id","raw_source","patient_number")
  ) |>
assign_datetime(
raw_dat = MD1
raw_var = "MDBDR",
raw_fmt = "d-m-y",
raw_unk = c("UN", "UNK"),
tar_var = "CMSTDTC"),
tar_dat = cm_inter,
id_vars = oak_id_vars())