pharmaverse / sdtm.oak

An EDC and Data Standard agnostic SDTM data transformation engine that automates the transformation of raw clinical data in ODM format to SDTM based on standard mapping algorithms
https://pharmaverse.github.io/sdtm.oak/
Apache License 2.0
25 stars 7 forks source link

Feature Request: `hardcode_no_ct` algorithm #40

Closed rammprasad closed 5 months ago

rammprasad commented 8 months ago

Feature Idea

The hardcode_no_ct algorithm will be implemented as a function. As referred in the documentation, this will be used to hardcode a value.

Algorithm Description - Mapping a hardcoded value to a target SDTM variable that has no terminology restrictions.

Example mappings - FA.FASCAT = ‘COVID-19 PROBABLE CASE’ CM.CMTRT = ‘FLUIDS’ CM.CMCAT = 'GENERAL CONCOMITANT MEDICATIONS'

function call

hardcode_no_ct(raw_dataset,
raw_variable,
target_sdtm_variable, 
target_hardcoded_value,
target_dataset,
merge_to_topic_by )

Input: raw_dataset - R dataframe. Usually, the raw dataset.

raw_variable - A Character string. Name of the variable in the raw dataset

target_sdtm_variable - A Character string. Name of the SDTM variable that has to be derived

target_hardcoded_value - A Character string. The hardcoded text.

target_dataset - Optional parameter. This is the target_dataset that was created in the previous step.

merge_to_topic_by - Optional parameter. A vector with the string that will be used to merge to the target_dataset

Output: A dataframe with oak_id_vars and target_sdtm_variable if target_dataset & merge_to_topic_by are not provided target_dataset with one additional variable target_sdtm_variable

Relevant Input

sdtm spec

study_number raw_source_model raw_dataset raw_dataset_ordinal raw_dataset_label raw_variable raw_variable_label raw_variable_ordinal raw_variable_type raw_data_format raw_codelist study_specific annotation_ordinal mapping_is_dataset annotation_text target_sdtm_domain target_sdtm_variable target_sdtm_variable_role target_sdtm_variable_codelist_code target_sdtm_variable_controlled_terms_or_format target_sdtm_variable_ordinal origin mapping_algorithm entity_sub_algorithm target_hardcoded_value target_term_value target_term_code condition_ordinal condition_group_ordinal condition_left_raw_dataset condition_left_raw_variable condition_left_sdtm_domain condition_left_sdtm_variable condition_operator condition_right_text_value condition_right_sdtm_domain condition_right_sdtm_variable condition_right_raw_dataset condition_right_raw_variable condition_next_logical_operator merge_type merge_left merge_right merge_condition unduplicate_keys groupby_keys target_resource_raw_dataset target_resource_raw_variable
lp_study e-CRF MD1 27 Concomitant Medications MDRAW Medication 3 LongText $200 NA FALSE 1 FALSE CM.CMTRT CM CMCAT Grouping Qualifier NA NA 10 CRF HARDCODE_NO_CT NA GENERAL CONCOMITANT MEDICATIONS NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
raw_datasaet = MD1 oak_id raw_source patient_number MDRAW
1 MD1 PATNUM BABY ASPIRIN
2 MD1 PATNUM CORTISPORIN
3 MD1 PATNUM ASPIRIN
4 MD1 PATNUM DIPHENHYDRAMINE HCL
5 MD1 PATNUM PARCETEMOL
6 MD1 PATNUM VOMIKIND
7 MD1 PATNUM ZENFLOX OZ
8 MD1 PATNUM AMITRYPTYLINE
9 MD1 PATNUM BENADRYL
10 MD1 PATNUM DIPHENHYDRAMINE HYDROCHLORIDE
11 MD1 PATNUM TETRACYCLINE
12 MD1 PATNUM BENADRYL
13 MD1 PATNUM SOMINEX
14 MD1 PATNUM ZQUILL

raw_variable = "MDRAW"

target_variable = "CMCAT"

target_dataset = cm_inter - Let's assume CMTRT, CMINDC variables are already derived and CMCAT is the third variable being processed

oak_id raw_source patient_number CMTRT CMINDC
1 MD1 PATNUM BABY ASPIRIN NA
2 MD1 PATNUM CORTISPORIN NAUSEA
3 MD1 PATNUM ASPIRIN ANEMIA
4 MD1 PATNUM DIPHENHYDRAMINE HCL NAUSEA
5 MD1 PATNUM PARCETEMOL PYREXIA
6 MD1 PATNUM VOMIKIND VOMITINGS
7 MD1 PATNUM ZENFLOX OZ DIARHHEA
8 MD1 PATNUM AMITRYPTYLINE COLD
9 MD1 PATNUM BENADRYL FEVER
10 MD1 PATNUM DIPHENHYDRAMINE HYDROCHLORIDE LEG PAIN
11 MD1 PATNUM TETRACYCLINE FEVER
12 MD1 PATNUM BENADRYL COLD
13 MD1 PATNUM SOMINEX COLD
14 MD1 PATNUM ZQUILL PAIN

merge_to_topic_by - oak_id_vars

Relevant Output

Option 1 - When the function call is

hardcode_no_ct(
raw_dataset = MD1,
raw_variable = "MDRAW",
target_sdtm_variable = "CMCAT", 
target_hardcoded_value = "GENERAL CONCOMITANT MEDICATIONS",
target_dataset = cm_inter,
merge_to_topic_by = c("oak_id","raw_source","patient_number"))

output dataset from the function

oak_id raw_source PATIENT_NUM CMTRT CMINDC CMCAT
1 MD1 PATNUM BABY ASPIRIN NA GENERAL CONCOMITANT MEDICATIONS
2 MD1 PATNUM CORTISPORIN NAUSEA GENERAL CONCOMITANT MEDICATIONS
3 MD1 PATNUM ASPIRIN ANEMIA GENERAL CONCOMITANT MEDICATIONS
4 MD1 PATNUM DIPHENHYDRAMINE HCL NAUSEA GENERAL CONCOMITANT MEDICATIONS
5 MD1 PATNUM PARCETEMOL PYREXIA GENERAL CONCOMITANT MEDICATIONS
6 MD1 PATNUM VOMIKIND VOMITINGS GENERAL CONCOMITANT MEDICATIONS
7 MD1 PATNUM ZENFLOX OZ DIARHHEA GENERAL CONCOMITANT MEDICATIONS
8 MD1 PATNUM AMITRYPTYLINE COLD GENERAL CONCOMITANT MEDICATIONS
9 MD1 PATNUM BENADRYL FEVER GENERAL CONCOMITANT MEDICATIONS
10 MD1 PATNUM DIPHENHYDRAMINE HYDROCHLORIDE LEG PAIN GENERAL CONCOMITANT MEDICATIONS
11 MD1 PATNUM TETRACYCLINE FEVER GENERAL CONCOMITANT MEDICATIONS
12 MD1 PATNUM BENADRYL COLD GENERAL CONCOMITANT MEDICATIONS
13 MD1 PATNUM SOMINEX COLD GENERAL CONCOMITANT MEDICATIONS
14 MD1 PATNUM ZQUILL PAIN GENERAL CONCOMITANT MEDICATIONS

Option 2 - When used without merging

hardcode_no_ct(
raw_dataset = MD1,
raw_variable = "MDRAW",
target_sdtm_variable = "CMCAT", 
target_hardcoded_value = "GENERAL CONCOMITANT MEDICATIONS")

Output dataset

oak_id raw_source PATIENT_NUM CMCAT
1 MD1 PATNUM GENERAL CONCOMITANT MEDICATIONS
2 MD1 PATNUM GENERAL CONCOMITANT MEDICATIONS
3 MD1 PATNUM GENERAL CONCOMITANT MEDICATIONS
4 MD1 PATNUM GENERAL CONCOMITANT MEDICATIONS
5 MD1 PATNUM GENERAL CONCOMITANT MEDICATIONS
6 MD1 PATNUM GENERAL CONCOMITANT MEDICATIONS
7 MD1 PATNUM GENERAL CONCOMITANT MEDICATIONS
8 MD1 PATNUM GENERAL CONCOMITANT MEDICATIONS
9 MD1 PATNUM GENERAL CONCOMITANT MEDICATIONS
10 MD1 PATNUM GENERAL CONCOMITANT MEDICATIONS
11 MD1 PATNUM GENERAL CONCOMITANT MEDICATIONS
12 MD1 PATNUM GENERAL CONCOMITANT MEDICATIONS
13 MD1 PATNUM GENERAL CONCOMITANT MEDICATIONS
14 MD1 PATNUM GENERAL CONCOMITANT MEDICATIONS

Reproducible Example/Pseudo Code

library(sdtm.oak)
library(dplyr)

cm <- cm_daw_data |>
  # Derive topic variable
  assign_no_ct(
    raw_dataset = MD1, 
    raw_variable = MDRAW,
    target_sdtm_var = CMTRT
  )  |>
  assign_no_ct(
    raw_dataset = MD1,
    raw_variable = MDIND,
    target_sdtm_var = CMINDC,
    merge_to_topic_by = c("oak_id","raw_source","patient_number")
  ) |>
hardcode_no_ct(
raw_dataset = MD1,
raw_variable = "MDRAW",
target_sdtm_variable = "CMCAT", 
target_hardcoded_value = "GENERAL CONCOMITANT MEDICATIONS",
target_dataset = cm_inter,
merge_to_topic_by = c("oak_id","raw_source","patient_number"))

Option 2 - Just to derive CMCAT

cm <- cm_daw_data |>
hardcode_no_ct(
raw_dataset = MD1,
raw_variable = "MDRAW",
target_sdtm_variable = "CMCAT", 
target_hardcoded_value = "GENERAL CONCOMITANT MEDICATIONS")
rammprasad commented 8 months ago

@ramiromagno - Please take a look. If this is ok, I will create similar requirements for assign_ct, assign_no_ct

ramiromagno commented 8 months ago

@rammprasad : Thank you for the examples. I've quickly prepared some draft code for hardcode_no_ct() in PR https://github.com/pharmaverse/sdtm.oak/pull/41. The idea is to quickly get your feedback on the gist of it to see if it aligns with the expected functionality. If I got it right, then I can create assertions for argument checking, and polish the code here and there, and make a proper PR.

ramiromagno commented 8 months ago

@ramiromagno - Please take a look. If this is ok, I will create similar requirements for assign_ct, assign_no_ct

Yes, this is perfect. Please do the same for assign_ct() and assign_no_ct().