fortify `dplyr::transmute`

cbeleites commented 7 years ago

[x] implement
[x] write unit tests
[x] document: man page
[x] explain in vignette

eoduniyi commented 4 years ago

I will tackle this.

eoduniyi commented 4 years ago

Dear @cbeleites,

Initially, I wasn't sure if a user should be able to transmute the $spc column. That is, should one be able to ever update the spectra with transmute? At the moment, I'm assuming transmuting/mutating the $spc column is allowed (suppose the user wanted to multiply all the spectra by a factor of 2).

Well, using dplyr::transmute(.data@data, spc = spc*2) results in an error...but .data@data$spc <- .data@data$spc*2 works as expected. So, I came up with a solution that makes use of quosures (requires rlang), eval, and paste liberally. I've never done any of this before, so it does feel weird. Anyways, please lmk if you see any logical or design errors...I'M ALL EARS.

Additional Notes: Similar to select.R, I transmute.R won't always return a hyperSpec object (e.g., transmute(flu, c) is a data frame)

Additional Qs: Q1: Is transmute(flu, spc2 = spc) still a hyperSpec object? Or should we just treat that as a data frame? -- currently returns a hyperSpec object with no labels for spc2

Q2: Should transmute(flu, spc = spc *0) or transmute(flu, spc = spc *-1) be allowed or guarded against? -- currently allowing this

Best, EO

cbeleites commented 4 years ago

Yes, transmuteing the spectra matrix should be allowed, and Q2 arithmetic operations on the spectra matrix are perfectly fine and needed frequently.
E.g. `transmute
dplyr::transmute() and $spc: I suspect that is due to dplyr::transmute() not working with columns that contain a whole matrix.
I think we have to expect more matrix columns than just $spc: Partial Least Squares regression is a widely used model in chemometrics. package pls also uses matrices in columns, e.g. a so-called multi-analyte regression may be written plsr(y ~ spc, ...) with $y being again a matrix with each analyte in a column.
Q2, yes that would still be a hyperSpec object, we need to make sure there is a $spc column with
additional consideration: we need to decide how to deal with labels.
- add a labels parameter so labels can be updated immediately, or
- prescribe that people should call spc %>% transmute () %>% setLabels()
  setLabels() would be a new additional name for hyperSpec::labels<-() (which is not very convenient to call as a setter since it must be quoted.
Opinions, please?

eoduniyi commented 4 years ago

I think I like the design of: spc %>% transmute(setLabels = T)

cbeleites commented 4 years ago

I guess we'll rather need

spc %>% 
   transmute (r = sqrt(x^2 + y^2), phi = atan(y/x), 
      labels = list (r = "r / μm", phi = expression (phi)
   )

or

spc %>% 
   transmute (r = sqrt(x^2 + y^2), phi = atan(y/x)) %>% 
   setLabels (r = "r / μm", phi = expression (phi))

... since we mostly won't be able to guess the proper new labels from the transmute() or mutate() specifications.

eoduniyi commented 4 years ago

transmute (r = sqrt(x^2 + y^2), phi = atan(y/x))

So, before we set the labels we'll end up with the columns r and phi...but where do we get the correct labels from?... Oh I see you're saying that the user will have to setLabels themselves.

eoduniyi commented 4 years ago

@cbeleites or @ximeg could either of you help me understand the difference between setLabels (r = "r / μm", phi = expression ("phi")) and setLabels (r = "r / μm", phi = expression (phi)) and setLabels (r = "r / μm", phi = "phi") and setLabels (r = "r / μm", phi = phi)

cbeleites commented 4 years ago

Have a look at ? plotmath and try the various versions e.g. for the xlab argument of a plot.

"phi" is a string, and the axis label will be phi spelled in latin letters
expression (phi) will produce a φ (Greek phi character)
phi looks for a variable called phi, and will evaluate to the content of that variable (or an error if no such variable exists)
expression ("phi") looks the same as "phi", but it allows combination with further plotmath, e.g. expression (frac("phi",2*pi)) which yields a fraction with enumerator "phi" and denominator 2π.

ximeg commented 4 years ago

@eoduniyi

Q1: Is transmute(flu, spc2 = spc) still a hyperSpec object? Or should we just treat that as a data frame? -- currently returns a hyperSpec object with no labels for spc2

According ot definition (correct me if I'm wrong), the hyperSpec can contain only one spectra matrix and it must be present and must be named spc and not something else. hyperSpec also can have an arbitrary number of extra 1D columns. AFAK it cannot contain two spectra matrices or have nested data.frames stored in the extra columns.

Therefore

it should be illegal to rename the spc matrix, like in transmute(flu, spc2 = spc) or rename(flu, spc2=spc). So transmute(flu, spc2 = spc) may return a data.frame, but definitely not a hyperSpec object
it should be illegal to add second data matrix or data.frame with e.g. mutate(spc2 = spc + 1)

cbeleites commented 4 years ago

There must be exactly one maxtrix $spc (but that can be empty, i.e. nrow rows x 0 columns).

Other columns in general can contain matrices as well. One example would be preparing a multi-analyte data set for PLS regression: that would have e.g. concentrations in a matrix $c of nc columns, and the spectra in $spc. PLS regression can then be done by pls::plsr (c ~ spc, data = hyperSpec_object, ...)

@ximeg do you think we need to explain this in vignette hyperSpec? Or even update the diagram?

ximeg commented 4 years ago

@cbeleites Thanks for the explanations! I think this is something worth mentioning in the documentation, and I would also prefer to have the diagram updated. What about something like this (changes in red)? hyperSpec_data_structure

eoduniyi commented 4 years ago

mutate and transmute for hypeSpec objects have been implemented. Going to close this now.

r-hyperspec / hySpc.dplyr

fortify `dplyr::transmute` #7