meerapatelmd / chariot

R Package that supports standardization of clinical data to Athena (athena.ohdsi.org)
http://meerapatelmd.github.io/chariot
6 stars 0 forks source link

*Add `ds_*` use case #25

Open meerapatelmd opened 3 years ago

meerapatelmd commented 3 years ago

Use Case

The most commonly seen medications with a non-null administration_dose field are first derived from the Drug Exposures table.

WITH ct AS (
SELECT de.drug_concept_id, COUNT(de.drug_concept_id) AS drug_concept_count
FROM omop_cdm_1.drug_exposure de 
WHERE de.administration_dose IS NOT NULL
GROUP BY de.drug_concept_id 
ORDER BY COUNT(de.drug_concept_id) DESC
)
SELECT ct.drug_concept_count, c.*
FROM ct 
LEFT JOIN omop_vocabulary.concept c
ON c.concept_id = ct.drug_concept_id
;
drug_concept_count <-
pg13::query(
  conn = conn, 
  sql_statement = 
  "
   WITH ct AS (
  SELECT de.drug_concept_id, COUNT(de.drug_concept_id) AS drug_concept_count
  FROM omop_cdm_1.drug_exposure de 
  WHERE de.administration_dose IS NOT NULL
  GROUP BY de.drug_concept_id 
  ORDER BY COUNT(de.drug_concept_id) DESC
  )
  SELECT ct.drug_concept_count, c.*
  FROM ct 
  LEFT JOIN omop_vocabulary.concept c
  ON c.concept_id = ct.drug_concept_id
  ;
  "
)
drug_concept_count

For testing, the top 10 most frequently seen drugs in the Drug Exposure table are filtered.

drug_concept <-
  drug_concept_count %>%
  slice(1:10)
drug_concept

The top 10 drugs are joined back with the Drug Exposures table to retrieve the administration_dose, administration_unit, and frequency_concept_id fields.

SELECT DISTINCT 
  a.*, b.administration_dose,b.administration_unit,b.frequency_concept_id 
FROM @drug_concept a 
LEFT JOIN omop_cdm_1.drug_exposure b 
ON a.concept_id = b.drug_concept_id
drug_record <-
pg13::join1(
  conn = conn, 
  write_schema = "patelm9",
  data = drug_concept, 
  column = "concept_id",
  select_join_on_fields = c("administration_dose", 
                            "administration_unit"),
  join_on_schema = "omop_cdm_1",
  join_on_table = "drug_exposure",
  join_on_column = "drug_concept_id",
  distinct = TRUE
)
drug_record

For easier visualization, the formatting of the concept attributes are merged into a single drug string, with the concept_id field now called the drug_id.

drug_record2 <-
  drug_record %>%
  chariot::merge_strip(into = "drug")
drug_record2

This dataset is then joined to the Drug Strength Staged table to get the staged value and unit fields for each drug.

SELECT a.*, b.ingredient_concept_id,b.value,b.unit 
FROM @drug_record2 a 
LEFT JOIN patelm9.drug_strength_staged b 
ON a.drug_id = b.drug_concept_id;
drug_strength_record <-
  pg13::join1(
    conn = conn, 
    write_schema = "patelm9",
    data = drug_record2,
    column = "drug_id",
    select_join_on_fields = c("ingredient_concept_id",
                              "value",
                              "unit"),
    join_on_schema = "patelm9",
    join_on_table = "drug_strength_staged",
    join_on_column = "drug_concept_id"
  ) 

The resulting table tells the story of the drug exposure for a given record. The dose of the drug at each administration, the units of administration, the corresponding ingredient_concept_id from the Drug Strength table, and the staged value and unit corresponding to the amount of the ingredient in 1 unit of the drug.

drug_strength_record %>%
  select(drug_id, administration_dose, administration_unit, ingredient_concept_id, value, unit)

The value field requires evaluation as a numeric value, which would require looping over almost 40,000 rows. Instead, each unique value is isolated, resulting in 9 rows. These 9 values are mapped to their corresponding numeric value.

values <-
  drug_strength_record %>%
  select(value) %>%
  distinct()
values
values$numeric_value <- sapply(values$value, function(x) eval(rlang::parse_expr(x)))
values

The resulting dataset is joined back with the original data.

drug_strength_record2 <- 
drug_strength_record %>%
  left_join(values, by = "value")
drug_strength_record2
fantasia::dcOMOP(conn = conn)

Themes

Themes can be viewed at: https://bootswatch.com/3/.

Syntax Highlighting

Syntax Highlighting Styles can be viewed at https://www.garrickadenbuie.com/blog/pandoc-syntax-highlighting-examples/.

Dataframe

Dataframe printing options include default, kable, tibble, or paged.

For paged dataframes, the chunk options include:

Figure Captions

library(tidyverse)
mpg %>%
  ggplot( aes(x=reorder(class, hwy), y=hwy, fill=class)) + 
    geom_boxplot() +
    xlab("class") +
    theme(legend.position="none")
meerapatelmd commented 3 years ago

Overview

The proposed additions to the Drug Exposure table are related to the drug administration and frequency from the source data.

The drug administration attributes administration_dose and administration_unit were designed to provide standardized and verified values sourced from the quantity and dose_unit_source_value fields in the Drug Exposures table, respectively. For solid formulations, the amount would be in mass such as 'grams', while liquid preparations would be in measurements of volume such as 'milliliters'. These fields are destined to be used alongside the Drug Strength table to calculate the total mass of the RxNorm Ingredient in a given administration, regardless of the original formulation. When this calculation is used in conjunction with the daily frequency related to the frequency_concept_id, the total active ingredient administered can be returned at a rate per day or an aggregate spanning the timeframe of the drug exposure record.

Solid formulations taken orally have a straightforward conversion because the information required to calculate the active ingredient mass is a multiplier of the number of tablets that were administered. Therefore, thequantity field suffices in providing this information.


$$ \text{quantity}_\text{de} \text{dose_unit_sourcevalue}\text{de} \text{amountvalue}\text{ds} * \text{amountunit}\text{ds} = \frac{\text{total active ingredient mass}}{\text{1 administration}} $$

However, for all other formulations such as liquids reported as concentrations (i.e. milligrams per milliliter), the volume administered in the quantity and dose_unit_source_value fields require additional conversions.


$$ \text{quantity}_\text{de} \text{dose_unit_sourcevalue}\text{de} \frac{\text{numeratorvalue}\text{ds}}{\text{denominatorvalue}\text{ds}} * \frac{\text{numeratorunit}\text{ds}}{\text{denominatorunit}\text{ds}} = \frac{\text{total active ingredient mass}}{\text{1 administration}} $$

In parallel to this, the frequency of drug administration is also carried over from the source data and standardized to a concept id as frequency_concept_id. The frequency_concept_id normalizes the amount of an active ingredient administered to a rate of per day.


$$ \frac{\text{total active ingredient mass}}{\text{1 administration}} \frac{\text{x administrations}_\text{de}}{\text{day}} = \frac{\text{x * total active ingredient mass}}{\text{day}} $$

Finally, the total active ingredient for a drug exposure record is calculated by deriving the timeframe of the drug exposure record in units of days, which is multiplied by the results above.


$$ (\text{drug_exposure_enddate}\text{de} - \text{drug_exposure_startdate}\text{de}) \frac{\text{x total active ingredient mass}}{\text{day}} = \text{total active ingredient mass in drug exposure} $$

The intent of the administration_dose and administration_unit fields is to have a normalized and qa'd representation of the quantity and dose_unit_source_value across all different types of drug formulations within the Drug Exposures table.


$$ \text{administrationdose}\text{de*} \sim \text{quantity}_\text{de} $$

$$ \text{administrationunit}\text{de*} \sim \text{dose_unit_sourcevalue}\text{de} $$