kylebaron commented 3 years ago

Summary

As a user, I'd like to be able to choose from several alternative representations of certain data attributes: unit, short, long, comment, decode.

Tests

tests/testthat/test-namespace.R
- load a yaml file and parse namespaces
- switch to tex namespace
- revert to base namespace
- alternate decode in namespace
- alternate decode in namespace

kylebaron commented 3 years ago

The primary motivation here is getting math expressions into the output when we are rendering a tex document, but something else when we are not. We want unit to be microgram / mL when in ascii world and $\\mu$g / mL when rendering a tex document. That's first. But the namespace idea generalizes for other contexts too. The tex namespace is special (yspec will try to invoke it when it is definitely in TeX land) but others are up to the user.

For example:

DV: 
  unit: microgram / mL
  tex::unit: $\\mu$g / mL

this is really shorthand for this code (also legal);

DV:
  namespace:
    tex: 
      unit: $\\mu$g / mL

then to switch from base to tex namespace:

spec <- ys_load("specfile.yml")

spec2 <- ys_namespace(spec, "tex")

in spec, unit will be microgram / mL and in spec2, unit will be $\\mu$g / mL

The tex namespace is sort of special; whenever we are rendering a data set specification object to document, we'll try to switch to the tex namespace.

Another example: we want a certain short name everywhere except when using them for plots:

WT: 
  short: baseline weight
  plot::short: weight

spec <- ys_load("file.yaml") %>% ys_namespace("plot")

now we are using this spec in plot mode.

This essentially replaces the glue functionality. I won't deprecate that , but I think it is super limited now that the spec object is getting used for more than just rendering tex documents.

kylebaron commented 3 years ago

validate decode updates

KatherineKayMRG commented 3 years ago

Hey @kylebaron - I like the look of this. Is the intention that this will be completely customisable for the user or will there be a limited set of features? I'd like to use different labels for the same variable on plots versus table panels. Would it be possible to define different "decode" options?

kylebaron commented 3 years ago

@KatherineKayMRG - yes, you can mess with unit, short, long, label, comment and decode

col: 
  values: [1,2,3]
  decode: [a,b,c]
  alt::decode: [A,B,C]

would let you choose between a,b,c or A,B,C as decode

dpastoor commented 3 years ago

I like this idea - one thing I wonder though is the namespace ordering.

my intuition would say to reverse the proposed syntax to make it feel more hierarchical.

col: 
  values: [1,2,3]
  decode: [a,b,c]
+ decode::alt: [A,B,C]
- alt::decode: [A,B,C]

Do you have any particular design thought on the ordering - I can see how the existing implementation would make scanning for a particular namespace a little easier, so I'm not opposed to either

kylebaron commented 3 years ago

The way the code is set up we could easily do either (it scans for ::); my thought was

dplyr::mutate

Rcpp::NumericVector

But I understand the idea and fine reversing it if that will make the most sense to people.

copernican commented 3 years ago

One might also imagine syntax like

col1:
  short: foo
col2:
  short:
    tex: bar
    alt: baz
    else: qux

I.e., if the key is a scalar, then that value is used everywhere, and if it isn't, then the key-value pairs are assumed to be namespaces. In addition to special handling for the tex namespace, we might imagine a default namespace, in the above example labeled else.

kylebaron commented 3 years ago

What about:

col1: 
  short: bar
  short.tex: baz
  short.alt: qux

like

print
print.factor
print.function

putting the specialization on the right; then . would indicate that the field is on the left and the specialization indicator is on the right

copernican commented 3 years ago

That syntax also feels okay. The potential downsides are

slightly more typing, i.e., repeating short
needing to parse key names (could a key name contain a . and not be intended for an alternative representation?) versus relying on the YAML structure (though in my proposal, you'd also have to parse key names to find the sentinel value else)

I think the main upside is that the syntax is simple and will be familiar to R users, and that might be compelling.

kylebaron commented 3 years ago

gh.R

kyleb 2020-11-10

library(tidyverse)

## ── Attaching packages ────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.1
## ✓ tidyr   1.1.1     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0

## ── Conflicts ───────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(yspec)

spec <- ys_load("foo.yml")

namespace syntax

cat(readLines("foo.yml"), sep = "\n")

## SETUP__:
##   description: testing namespaces
## DV:
##   short: beta-amyloid
##   short.tex: $\beta$-amyloid
##   unit: microgram/ml
##   unit.tex: "$\\mu$g/ml"
##   short.plot: dee-vee
## D: 
##   short: some data
##   unit: "L/kg"
##   namespace: 
##     tex: 
##       unit: "fodfas"
## FORM:
##   values: [1,2,3]
##   decode: [tablet, syrup, patch]
##   decode.tex: [TABLET, SYRUP, PATCH]

spec

##  name c d unit         short        source
##  DV   - - microgram/ml beta-amyloid .     
##  D    - - L/kg         some data    .     
##  FORM - + .            FORM         .

List available

ys_namespace(spec)

## namespaces:

##  - base
##  - plot
##  - tex

Switch

spec2 <- ys_namespace(spec, "tex")

spec2

##  name c d unit       short            source
##  DV   - - $\\mu$g/ml $\\beta$-amyloid .     
##  D    - - fodfas     some data        .     
##  FORM - + .          FORM             .

Reset back to original

spec3 <- ys_namespace(spec, "base")

identical(spec3, spec)

## [1] TRUE

Error to ask for ns that doesn’t exist

try(ys_namespace(spec, "kyle"))

## Error : `kyle` is not a namespace in this specification object

`tex` is special

When rendering in tex environment, we can switch here if available

yspec:::try_tex_namespace(spec3)

##  name c d unit       short            source
##  DV   - - $\\mu$g/ml $\\beta$-amyloid .     
##  D    - - fodfas     some data        .     
##  FORM - + .          FORM             .

kylebaron commented 3 years ago

@KatherineKayMRG @dpastoor @copernican

Another proposal:

we control key names and . would indicate field.namespace
- unit.tex
when alt rep is provided, a base namespace entry is created so we can revert
you can list available namespaces
you can recursively apply namespaces
we track namespace change history but more for info / debugging at this point (no public api); but changing to base namespace resets the history
under the hood functionality to switch to tex namespace if available; using this when rendering define.pdf documents
you could go nuts on this ... and letting people hang themselves if they want it; but examples are going to be "switch to this or that namespace" and that's it; happy to add guard rails (only one ns switch etc) but I'd rather not and see what does or does not happen without the rails
will add tests once there is agreement on basic functinoality

callistosp commented 3 years ago

I like this functionality. I think my use of it in a typical workflow would be to just load in the appropriate namespace for my script (e.g., plotting script loads plot namespace). I would prefer to have something with the SAME NAME from a DIFFERENT OBJECT over the alternative.

metrumresearchgroup / yspec

Add namespace functionality #38

Summary

Tests

gh.R

namespace syntax

List available

Switch

Reset back to original

Error to ask for ns that doesn’t exist

`tex` is special

metrumresearchgroup / yspec

Add namespace functionality #38

Summary

Tests

gh.R

namespace syntax

List available

Switch

Reset back to original

Error to ask for ns that doesn’t exist

tex is special

`tex` is special