Closed kylebaron closed 3 years ago
The primary motivation here is getting math expressions into the output when we are rendering a tex document, but something else when we are not. We want unit to be microgram / mL
when in ascii world and $\\mu$g / mL
when rendering a tex document. That's first. But the namespace idea generalizes for other contexts too. The tex
namespace is special (yspec will try to invoke it when it is definitely in TeX land) but others are up to the user.
For example:
DV:
unit: microgram / mL
tex::unit: $\\mu$g / mL
this is really shorthand for this code (also legal);
DV:
namespace:
tex:
unit: $\\mu$g / mL
then to switch from base to tex
namespace:
spec <- ys_load("specfile.yml")
spec2 <- ys_namespace(spec, "tex")
in spec
, unit will be microgram / mL
and in spec2
, unit
will be $\\mu$g / mL
The tex
namespace is sort of special; whenever we are rendering a data set specification object to document, we'll try to switch to the tex
namespace.
Another example: we want a certain short name everywhere except when using them for plots:
WT:
short: baseline weight
plot::short: weight
spec <- ys_load("file.yaml") %>% ys_namespace("plot")
now we are using this spec in plot mode.
This essentially replaces the glue
functionality. I won't deprecate that , but I think it is super limited now that the spec object is getting used for more than just rendering tex documents.
validate decode updates
Hey @kylebaron - I like the look of this. Is the intention that this will be completely customisable for the user or will there be a limited set of features? I'd like to use different labels for the same variable on plots versus table panels. Would it be possible to define different "decode" options?
@KatherineKayMRG - yes, you can mess with unit, short, long, label, comment and decode
col:
values: [1,2,3]
decode: [a,b,c]
alt::decode: [A,B,C]
would let you choose between a,b,c
or A,B,C
as decode
I like this idea - one thing I wonder though is the namespace ordering.
my intuition would say to reverse the proposed syntax to make it feel more hierarchical.
col:
values: [1,2,3]
decode: [a,b,c]
+ decode::alt: [A,B,C]
- alt::decode: [A,B,C]
Do you have any particular design thought on the ordering - I can see how the existing implementation would make scanning for a particular namespace a little easier, so I'm not opposed to either
The way the code is set up we could easily do either (it scans for ::
); my thought was
dplyr::mutate
Rcpp::NumericVector
But I understand the idea and fine reversing it if that will make the most sense to people.
One might also imagine syntax like
col1:
short: foo
col2:
short:
tex: bar
alt: baz
else: qux
I.e., if the key is a scalar, then that value is used everywhere, and if it isn't, then the key-value pairs are assumed to be namespaces. In addition to special handling for the tex
namespace, we might imagine a default namespace, in the above example labeled else
.
What about:
col1:
short: bar
short.tex: baz
short.alt: qux
like
print
print.factor
print.function
putting the specialization on the right; then .
would indicate that the field is on the left and the specialization indicator is on the right
That syntax also feels okay. The potential downsides are
slightly more typing, i.e., repeating short
needing to parse key names (could a key name contain a .
and not be intended for an alternative representation?) versus relying on the YAML structure (though in my proposal, you'd also have to parse key names to find the sentinel value else
)
I think the main upside is that the syntax is simple and will be familiar to R users, and that might be compelling.
kyleb 2020-11-10
library(tidyverse)
## ── Attaching packages ────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.3 ✓ dplyr 1.0.1
## ✓ tidyr 1.1.1 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ───────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(yspec)
spec <- ys_load("foo.yml")
cat(readLines("foo.yml"), sep = "\n")
## SETUP__:
## description: testing namespaces
## DV:
## short: beta-amyloid
## short.tex: $\beta$-amyloid
## unit: microgram/ml
## unit.tex: "$\\mu$g/ml"
## short.plot: dee-vee
## D:
## short: some data
## unit: "L/kg"
## namespace:
## tex:
## unit: "fodfas"
## FORM:
## values: [1,2,3]
## decode: [tablet, syrup, patch]
## decode.tex: [TABLET, SYRUP, PATCH]
spec
## name c d unit short source
## DV - - microgram/ml beta-amyloid .
## D - - L/kg some data .
## FORM - + . FORM .
ys_namespace(spec)
## namespaces:
## - base
## - plot
## - tex
spec2 <- ys_namespace(spec, "tex")
spec2
## name c d unit short source
## DV - - $\\mu$g/ml $\\beta$-amyloid .
## D - - fodfas some data .
## FORM - + . FORM .
spec3 <- ys_namespace(spec, "base")
identical(spec3, spec)
## [1] TRUE
try(ys_namespace(spec, "kyle"))
## Error : `kyle` is not a namespace in this specification object
tex
is specialWhen rendering in tex environment, we can switch here if available
yspec:::try_tex_namespace(spec3)
## name c d unit short source
## DV - - $\\mu$g/ml $\\beta$-amyloid .
## D - - fodfas some data .
## FORM - + . FORM .
@KatherineKayMRG @dpastoor @copernican
Another proposal:
.
would indicate field.namespace
unit.tex
base
namespace resets the historyI like this functionality. I think my use of it in a typical workflow would be to just load in the appropriate namespace for my script (e.g., plotting script loads plot namespace). I would prefer to have something with the SAME NAME from a DIFFERENT OBJECT over the alternative.
Summary
As a user, I'd like to be able to choose from several alternative representations of certain data attributes: unit, short, long, comment, decode.
Tests