metrumresearchgroup / yspec

Data Specification for Pharmacometrics
https://metrumresearchgroup.github.io/yspec
5 stars 2 forks source link

Add namespace functionality #38

Closed kylebaron closed 3 years ago

kylebaron commented 3 years ago

Summary

As a user, I'd like to be able to choose from several alternative representations of certain data attributes: unit, short, long, comment, decode.

Tests

kylebaron commented 3 years ago

The primary motivation here is getting math expressions into the output when we are rendering a tex document, but something else when we are not. We want unit to be microgram / mL when in ascii world and $\\mu$g / mL when rendering a tex document. That's first. But the namespace idea generalizes for other contexts too. The tex namespace is special (yspec will try to invoke it when it is definitely in TeX land) but others are up to the user.

For example:

DV: 
  unit: microgram / mL
  tex::unit: $\\mu$g / mL

this is really shorthand for this code (also legal);

DV:
  namespace:
    tex: 
      unit: $\\mu$g / mL

then to switch from base to tex namespace:

spec <- ys_load("specfile.yml")

spec2 <- ys_namespace(spec, "tex")

in spec, unit will be microgram / mL and in spec2, unit will be $\\mu$g / mL

The tex namespace is sort of special; whenever we are rendering a data set specification object to document, we'll try to switch to the tex namespace.

Another example: we want a certain short name everywhere except when using them for plots:

WT: 
  short: baseline weight
  plot::short: weight
spec <- ys_load("file.yaml") %>% ys_namespace("plot")

now we are using this spec in plot mode.

This essentially replaces the glue functionality. I won't deprecate that , but I think it is super limited now that the spec object is getting used for more than just rendering tex documents.

kylebaron commented 3 years ago

validate decode updates

KatherineKayMRG commented 3 years ago

Hey @kylebaron - I like the look of this. Is the intention that this will be completely customisable for the user or will there be a limited set of features? I'd like to use different labels for the same variable on plots versus table panels. Would it be possible to define different "decode" options?

kylebaron commented 3 years ago

@KatherineKayMRG - yes, you can mess with unit, short, long, label, comment and decode

col: 
  values: [1,2,3]
  decode: [a,b,c]
  alt::decode: [A,B,C]

would let you choose between a,b,c or A,B,C as decode

dpastoor commented 3 years ago

I like this idea - one thing I wonder though is the namespace ordering.

my intuition would say to reverse the proposed syntax to make it feel more hierarchical.

col: 
  values: [1,2,3]
  decode: [a,b,c]
+ decode::alt: [A,B,C]
- alt::decode: [A,B,C]

Do you have any particular design thought on the ordering - I can see how the existing implementation would make scanning for a particular namespace a little easier, so I'm not opposed to either

kylebaron commented 3 years ago

The way the code is set up we could easily do either (it scans for ::); my thought was

dplyr::mutate

Rcpp::NumericVector

But I understand the idea and fine reversing it if that will make the most sense to people.

copernican commented 3 years ago

One might also imagine syntax like

col1:
  short: foo
col2:
  short:
    tex: bar
    alt: baz
    else: qux

I.e., if the key is a scalar, then that value is used everywhere, and if it isn't, then the key-value pairs are assumed to be namespaces. In addition to special handling for the tex namespace, we might imagine a default namespace, in the above example labeled else.

kylebaron commented 3 years ago

What about:

col1: 
  short: bar
  short.tex: baz
  short.alt: qux

like

print
print.factor
print.function

putting the specialization on the right; then . would indicate that the field is on the left and the specialization indicator is on the right

copernican commented 3 years ago

That syntax also feels okay. The potential downsides are

  1. slightly more typing, i.e., repeating short

  2. needing to parse key names (could a key name contain a . and not be intended for an alternative representation?) versus relying on the YAML structure (though in my proposal, you'd also have to parse key names to find the sentinel value else)

I think the main upside is that the syntax is simple and will be familiar to R users, and that might be compelling.

kylebaron commented 3 years ago

gh.R

kyleb 2020-11-10

library(tidyverse)
## ── Attaching packages ────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.1
## ✓ tidyr   1.1.1     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0

## ── Conflicts ───────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(yspec)

spec <- ys_load("foo.yml")

namespace syntax

cat(readLines("foo.yml"), sep = "\n")
## SETUP__:
##   description: testing namespaces
## DV:
##   short: beta-amyloid
##   short.tex: $\beta$-amyloid
##   unit: microgram/ml
##   unit.tex: "$\\mu$g/ml"
##   short.plot: dee-vee
## D: 
##   short: some data
##   unit: "L/kg"
##   namespace: 
##     tex: 
##       unit: "fodfas"
## FORM:
##   values: [1,2,3]
##   decode: [tablet, syrup, patch]
##   decode.tex: [TABLET, SYRUP, PATCH]
spec
##  name c d unit         short        source
##  DV   - - microgram/ml beta-amyloid .     
##  D    - - L/kg         some data    .     
##  FORM - + .            FORM         .

List available

ys_namespace(spec)
## namespaces:

##  - base
##  - plot
##  - tex

Switch

spec2 <- ys_namespace(spec, "tex")

spec2
##  name c d unit       short            source
##  DV   - - $\\mu$g/ml $\\beta$-amyloid .     
##  D    - - fodfas     some data        .     
##  FORM - + .          FORM             .

Reset back to original

spec3 <- ys_namespace(spec, "base")

identical(spec3, spec)
## [1] TRUE

Error to ask for ns that doesn’t exist

try(ys_namespace(spec, "kyle"))
## Error : `kyle` is not a namespace in this specification object

tex is special

When rendering in tex environment, we can switch here if available

yspec:::try_tex_namespace(spec3)
##  name c d unit       short            source
##  DV   - - $\\mu$g/ml $\\beta$-amyloid .     
##  D    - - fodfas     some data        .     
##  FORM - + .          FORM             .
kylebaron commented 3 years ago

@KatherineKayMRG @dpastoor @copernican

Another proposal:

callistosp commented 3 years ago

I like this functionality. I think my use of it in a typical workflow would be to just load in the appropriate namespace for my script (e.g., plotting script loads plot namespace). I would prefer to have something with the SAME NAME from a DIFFERENT OBJECT over the alternative.