kylebaron commented 4 months ago

Summary

The PR adds functionality to create table notes from a tex glossary file.

An entry in a glossary file looks like this

\newacronym{label}{abbreviation}{definition}

The basic idea is to read and parse the glossary file to create a string of the form label: definition. Likely multiple terms will be selected in which the string will take the form label1: definition1; label2: definition2.

The PR also implements ability to read from glossary info in yaml-formatted file.

Objects

glossary - a list of glossary entries; the names of the list are the glossary labels
glossary_entry - a list containing the abbreviation and the definition

Functions

read_glossary() reads and parses the glossary file; returns a glossary object
glossary_notes() takes in the glossary file name or a glossary list as well as labels to select and returns a character vector that can be added to a table via st_notes()
st_notes_glo() takes a glossary list (from read_glossary()) and labels to select and adds the notes in a table pipeline
as_glossary() coerce a list to a glossary object
update_abbrev() - you can update the abbreviation for any entry (but can't change the label or the definition)

Reprex

library(reprex)
library(pmtables)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

Read in a glossary file

We can read from a .tex glossary file

glofile <- system.file("glo", "glossary.tex", package = "pmtables")

g <- read_glossary(glofile)

or a yaml file

gloyaml <- system.file("glo", "glossary.yaml", package = "pmtables")

y <- read_glossary(gloyaml)

The result is a glossary object

y
#> egfr : estimated glomerular filtration rate
#> bmi  : body mass index
#> wt   : weight
#> ht   : height
#> cmax : maximum concentration in the dosing inte...
#> cmin : minimum concentration in the dosing inte...
#> auc  : area under the concentration time curve

yaml format

The yaml file needs to be written to be read by yaml_as_df(); the outer level is the labels and inner are the abbreviations and definitions

cat(readLines(gloyaml), sep = "\n")
#> egfr:
#>   def: estimated glomerular filtration rate
#>   abb: eGFR
#> bmi:
#>   def: body mass index
#>   abb: BMI
#> wt:
#>   def: weight
#>   abb: WT
#> ht:
#>   abb: HT
#>   def: height
#> cmax:
#>   def: maximum concentration in the dosing interval
#>   abb: Cmax
#> cmin:
#>   def: minimum concentration in the dosing interval
#>   abb: Cmin
#> auc:
#>   abb: AUC
#>   def: area under the concentration time curve

Create a glossary object

x <- as_glossary(c = "cat", d = "dog", s = "snake")
x
#> c : cat
#> d : dog
#> s : snake

By default, the abbreviation is taken to be the label

Update abbreviation

x <- update_abbrev(x, s = "SNAKE")
x$s
#> snake (SNAKE)

Work with glossary object

A lot of this is driven by need to potentially combine glossary objects or look into an object to see what is in there. I would have rather stuck with a simpler data structure, but it needed to be more complex and I added this functionality.

Extract and print

g[1:10]
#> ADA    : anti-drug antibodies
#> AE     : adverse event
#> AIC    : Akaike information criterion
#> ALAG   : oral absorption lag time
#> ASCII  : American Standard Code for Information I...
#> AST    : aspartate transaminase
#> AUC    : area under the concentration-time curve
#> AUCss  : area under the concentration-time curve ...
#> AUCC   : cumulative area under the concentration-...
#> AUCC50 : area under the concentration-time curve ...

g$WT
#> subject weight (WT)

Head

head(g)
#> ADA   : anti-drug antibodies
#> AE    : adverse event
#> AIC   : Akaike information criterion
#> ALAG  : oral absorption lag time
#> ASCII : American Standard Code for Information I...
#> AST   : aspartate transaminase

Select

g2 <- select_glossary(g, AIC, AST, ADA)

Combine

g3 <- c(g2, y)

Coerce

data frame

as.data.frame(g3)
#>    label                                   definition abbreviation
#> 1    AIC                 Akaike information criterion          AIC
#> 2    AST                       aspartate transaminase          AST
#> 3    ADA                         anti-drug antibodies          ADA
#> 4   egfr         estimated glomerular filtration rate         eGFR
#> 5    bmi                              body mass index          BMI
#> 6     wt                                       weight           WT
#> 7     ht                                       height           HT
#> 8   cmax maximum concentration in the dosing interval         Cmax
#> 9   cmin minimum concentration in the dosing interval         Cmin
#> 10   auc      area under the concentration time curve          AUC

list

as.list(g3[1:2])
#> $AIC
#> $AIC$abbreviation
#> [1] "AIC"
#> 
#> $AIC$definition
#> [1] "Akaike information criterion"
#> 
#> 
#> $AST
#> $AST$abbreviation
#> [1] "AST"
#> 
#> $AST$definition
#> [1] "aspartate transaminase"

Create notes

With a subset (expected most of the time)

glossary_notes(g3, AIC, wt, auc) 
#> [1] "AIC: Akaike information criterion; WT: weight; AUC: area under the concentration time curve"

With all entries

glossary_notes(g3)
#> [1] "AIC: Akaike information criterion; AST: aspartate transaminase; ADA: anti-drug antibodies; eGFR: estimated glomerular filtration rate; BMI: body mass index; WT: weight; HT: height; Cmax: maximum concentration in the dosing interval; Cmin: minimum concentration in the dosing interval; AUC: area under the concentration time curve"

In a pipeline

stdata() %>% 
  st_new() %>% 
  st_notes_glo(g3, AIC, wt, auc, width = 1) %>% 
  stable() %>% 
  st_as_image()

Alternatively

notes <- glossary_notes(g3, ht, wt, bmi)
stdata() %>% 
  st_new() %>% 
  st_notes(notes) %>% 
  st_panel("STUDY") %>% 
  stable() %>% 
  st_as_image()

Pass in names

labels <- c("AIC", "AST")
stdata() %>% 
  st_new() %>% 
  st_notes_glo(g3, labels = labels) %>% 
  stable() %>% 
  st_as_image()

^{Created on 2024-05-23 with reprex v2.0.2}

KatherineKayMRG commented 4 months ago

@kylebaron I have a couple of questions about what this is doing (and if it's what we want). Above you say:

An entry in a glossary file looks like this

\newacronym{label}{abbreviation}{definition}

The basic idea is to read and parse the glossary file to create a string of the form label: definition. Likely multiple terms will be selected in which the string will take the form label1: definition1; label2: definition2.

So it's the information in the {label}{definition} that will go in the footer right?

Can these functions handle cases where you might was the {abbreviation}{definition} combo? Maybe as an optional extra. For example, the CV% that you may have in the table usually uses CVP in the glossary:

\newacronym{CVP}{CV\%}{percent coefficient of variation}

I just had a similar case on a project where tables used FAPα but we couldn't use the greek letter in the glossary label, so the label was FAPa and the abbreviation included the greek letter

It would be nice to be able to specify whether label or abbreviation get used.

Different question, will glossary entries like this (below) mess with your current code?

\newacronym[sort=f]{F}{\ensuremath{F}}{absolute bioavailability}

kylebaron commented 4 months ago

Hey @KatherineKayMRG -

Good points. I think we want to refer to to the label, but agree it would be better to put the abbreviation in there. I can make that happen.

Different question, will glossary entries like this (below) mess with your current code?

\newacronym[sort=f]{F}{\ensuremath{F}}{absolute bioavailability}

I think the code handles this as it is.

kylebaron commented 4 months ago

I'm going to refactor this ... will be more complicated but I think there is a need.

kylebaron commented 1 month ago

@kyleam - I think I addressed everything here, adding tests where I missed badly on some of the implementation. But let me know if I didn't get one of the comments right or if I just overlooked anything.

kylebaron commented 1 month ago

Thanks, @KatherineKayMRG. It sounds like @timwaterhouse is going to give this a spin on upcoming project and we can tweak and adjust some things from there.

KatherineKayMRG commented 1 month ago

@kylebaron - that project of @timwaterhouse's is the one I was mentioning on slack. I've been reworking their reports to use a shared glossary file and I'm looking forward to trying out this functionality on that project.

metrumresearchgroup / pmtables

Pull table notes from tex glossary files and yaml #326

Summary

Objects

Functions

Reprex

Read in a glossary file

yaml format

Create a glossary object

Update abbreviation

Work with glossary object

Extract and print

Head

Select

Combine

Coerce

data frame

list

Create notes