tbates / umx

Making Structural Equation Modeling (SEM) in R quick & powerful
https://tbates.github.io/
44 stars 17 forks source link

Discussing small improvements to umxMatrix() #195

Closed lf-araujo closed 2 years ago

lf-araujo commented 2 years ago

We often discuss OpenMx adoption at meetings. Sacha Epskamp briefly mentions here the verbosity of an OpenMx model being somewhat hard to new users, which umx() arguably improves upon.

Converting some of the models circulated at Boulder to umxMatrix() reduces and improves their readability, but I noticed I kept retyping some things that perhaps should be automated, can/should these steps be hidden from the user?

All matrices are byrow!

Never came across an example in which a matrix was actually defined by column. For OpenMx it makes sense to be closer to matrix(), but can we avoid this in umxMatrix()?

My implementation:

if (!missing(byrow)) {options(mxByrow = byrow)} else {options(mxByrow = TRUE)} # byrow by default for matrices

Can we infer the number of rows of a matrix internally and make nrow optional?

Tiny change, saves very little typing. Not sure if worth it:

## argument change nrow=NULL

    if (missing(nrow)) {
      if (!missing(free)) {nrow = dim(matrix(free, ncol = ncol))[1]}
      if (!missing(labels)) {nrow = dim(matrix(labels, ncol = ncol))[1]}
      if (!missing(values)) {nrow = dim(matrix(values, ncol = ncol))[1]}
    }

Setting labels? This is because you want these parameters to be free!

More often than not, when we set labels using umxMatrix() we want these to be set free = T. So can we/should we make the free matrix reflect what we passed on as labels, so that NA's are FALSE? At least for some types of matrices (like Symm and Full)?

    if (type %in% c("Symm", "Full")) { 
      if (missing(free)) {
        if (!missing(labels)) {
          free = labels 
          free = (!is.na(free))

        }
      }
    }

This will make:

          labels = c(NA, "g2", "b1", NA,
                     "g1", NA, NA, "b3",
                     NA, NA, NA, NA,
                     NA, NA, NA, NA),

have an implicit:

          free = c(F, T, T, F,
                   T, F, F, T,
                   F, F, F, F,
                   F, F, F, F)),
Full edited umxMatrix function ```r umxMatrix <- function (name = NA, type = "Full", nrow = NULL, ncol = NA, free = FALSE, values = NA, labels = TRUE, lbound = NA, ubound = NA, byrow = getOption("mxByrow"), baseName = NA, dimnames = NA, condenseSlots = getOption("mxCondenseMatrixSlots"), ..., joinKey = as.character(NA), joinModel = as.character(NA), jiggle = NA) { legalMatrixTypes = c("Diag", "Full", "Iden", "Lower", "Sdiag", "Stand", "Symm", "Unit", "Zero") if (name %in% legalMatrixTypes) { warning("You used ", omxQuotes(name), " as the name of your matrix: That's also a valid type, so make sure you're not putting type first...") } if (is.numeric(type)) { stop("You used ", omxQuotes(type), " as the type of your matrix. You probably need to add something like type='Full' or specify nrow and ncol") } if (isTRUE(labels)) { setLabels = TRUE labels = NA } else { setLabels = FALSE } if (!missing(byrow)) {options(mxByrow = byrow)} else {options(mxByrow = TRUE)} # byrow by default for matrices if (missing(nrow)) { if (!missing(free)) {nrow = dim(matrix(free, ncol = ncol))[1]} if (!missing(labels)) {nrow = dim(matrix(labels, ncol = ncol))[1]} if (!missing(values)) {nrow = dim(matrix(values, ncol = ncol))[1]} } if (type %in% c("Symm", "Full")) { if (missing(free)) { if (!missing(labels)) { free = labels free = (!is.na(free)) } } } x = mxMatrix(type = type, nrow = nrow, ncol = ncol, free = free, values = values, labels = labels, lbound = lbound, ubound = ubound, byrow = byrow, dimnames = dimnames, name = name, condenseSlots = condenseSlots, joinKey = joinKey, joinModel = joinModel, ...) if (setLabels) { x = xmuLabel(x, baseName = baseName, jiggle = jiggle) } return(x) } ```
tbates commented 2 years ago

thanks a always for the thoughtful suggestions!

90% of the user friction comes in RAM and umxRAM and umxPath + lavaan syntax support in umx solve most of that, I think.

For matrices, however, there's no competition and the user base is 99% expert users writing pretty explicit code and valuing that.

That said I agree re-writing workshop scripts in umxMatrix makes them much more readable (e.g. name first, auto-label), and it's the first thing I do when I get one of those scripts come up as a question/tutorial.

And I also find those blobs of label snarls and F/T free lists… unconvivial :-)

But not sure this is the way forward (in umxMatrix at least)

To the three suggestions:

  1. Switching the byrow default would impact existing scripts in ways that are invisible and dangerous (still run but make no sense).
  2. Inferring whichever of nrow/ncol is missing from length of values is probably OK. I think it wouldn't break anything. It's a tiny saving however.
  3. 90% of umxMatrix uses in umx are autolabeled - so freeing cells that are labelled would break nearly all umx code. Also free = !is.na(labels) has invisible and hard to visualise side effects: when people set one or two labels, it's often not exhaustive and doesn't imply the others are fixed. Feels like 100% danger for 1% use case.
umxMatrix("a", "Full", 2, 2, values=1:4)
     [,1]       [,2]      
[1,] "a_r1c1" "a_r1c2"
[2,] "a_r2c1" "a_r2c2"

Perhaps a new valueFree parameter which would set values and also set free=FALSE where value == 0 would work.

One possibility would be to add a new umxM() function with byrow=TRUE, and the ability to guess either of nrow or ncol was missing (and add this to umxMatrix).

lf-araujo commented 2 years ago

Thanks!

90% of umxMatrix uses in umx are autolabeled - so freeing cells that are labelled would break nearly all umx code. Also free = !is.na(labels) has invisible and hard to visualise side effects: when people set one or two labels, it's often not exhaustive and doesn't imply the others are fixed. Feels like 100% danger for 1% use case.

I will look into this:

Perhaps a new valueFree parameter which would set values and also set free=FALSE where value == 0 would work.