Closed codydh closed 6 years ago
It might have something to do with what you are passing to the names
and leaveout
arguments. Those should be character vectors:
c("ocap1rc", "ocap2rc", "ocap3rc", ...)
not a single character string, like "ocap1rc ocap2rc ..."
. And what you passed to leaveout
needs to be concatentated to be recognized as a single vector, rather than a single character string followed by 2 unnamed (and unused) arguments.
leaveout = c("NeedSatAut", "NeedSatComp", "NeedSatRel")
You're right, my leaveout
string above was incorrect, actually due to me trying exactly what you mention. If I modify my working example to use character vectors as you suggest, e.g.:
dTemp <- subset(d, select=c("ocap1rc", "ocap2rc", "ocap3rc", "ocap4rc", "ocap5rc", "ocap6rc", "ocap7rc", "ocap8rc", "ocap9rc", "health1", "health2", "health3", "health4", "health5", "health6", "health7", "health8", "health9"))
syntax <- 'OCAP =~ V1 + V2 + V3
Health =~ V4 + V5 + V6
OCAP ~~ Health'
parcelAllocation(
nPerPar = list(c(3,3,3), c(3,3,3)),
facPlc = list(
c("ocap1rc", "ocap2rc", "ocap3rc", "ocap4rc", "ocap5rc", "ocap6rc", "ocap7rc", "ocap8rc", "ocap9rc"),
c("health1", "health2", "health3", "health4", "health5", "health6", "health7", "health8", "health9")
),
nAlloc = 10,
syntax = syntax,
dataset = dTemp,
names = c("ocap1rc", "ocap2rc", "ocap3rc", "ocap4rc", "ocap5rc", "ocap6rc", "ocap7rc", "ocap8rc", "ocap9rc", "health1", "health2", "health3", "health4", "health5", "health6", "health7", "health8", "health9")
)
I then get the error:
[1] "** WARNING! ** Number of parcel names provided not equal to number of parcels requested. Check input!"
Error in `colnames<-`(`*tmp*`, value = c("ocap1rc", "ocap2rc", "ocap3rc", :
length of 'dimnames' [2] not equal to array extent
Doing it the way I'd posted in my working example above functioned and removed the error, which is what led me to believe that the syntax you suggest was not correct.
This might be an error on my end, but it's not clear why the input here isn't being accepted.
No, the syntax you first posted was incorrect for the reason I posted above. The syntax you posted most recently is incorrect for a reason I didn't notice originally. If you read the help-page description of the names
argument, you will see those should be the names of the parcels (i.e., what appears in your model syntax as V1, V2, etc.), not observed-variable names. The default parcel names will be V1, V2, etc., but you can give them different names via the names
argument.
Ok, so if I'm understanding it correctly, everything about this syntax should be correct:
dTemp <- subset(d, select=c("ocap1rc", "ocap2rc", "ocap3rc", "ocap4rc", "ocap5rc", "ocap6rc", "ocap7rc", "ocap8rc", "ocap9rc", "health1", "health2", "health3", "health4", "health5", "health6", "health7", "health8", "health9", "NeedSatAut", "NeedSatComp", "NeedSatRel"))
syntax <- 'OCAP =~ V1 + V2 + V3
Health =~ V4 + V5 + V6
NeedSat =~ NeedSatAut + NeedSatComp + NeedSatRel
OCAP ~~ Health
OCAP ~~ NeedSat
Health ~~ NeedSat'
parcelAllocation(
nPerPar = list(c(3,3,3), c(3,3,3)),
facPlc = list(
c("ocap1rc", "ocap2rc", "ocap3rc", "ocap4rc", "ocap5rc", "ocap6rc", "ocap7rc", "ocap8rc", "ocap9rc"),
c("health1", "health2", "health3", "health4", "health5", "health6", "health7", "health8", "health9"),
c("NeedSatAut", "NeedSatComp", "NeedSatRel")
),
nAlloc = 10,
syntax = syntax,
dataset = dTemp,
names = c("V1", "V2", "V3", "V4", "V5", "V6"),
leaveout = c("NeedSatAut", "NeedSatComp", "NeedSatRel")
)
Namely, the SEM model syntax is correct, I've passed nPerPar
as the number of data frame variables per parceled variable, facPlc
as the variables belonging to each latent variable, names
as the explicit names of the parcels, and leaveout
as the variables in my dataset that should not be involved in parceling, but should be in the model. However, I am still seeing:
Error in parcelAllocation(nPerPar = list(c(3, 3, 3), c(3, 3, 3)), facPlc = list(c("ocap1rc", :
** WARNING! ** Parcels incorrectly specified. Check input!
In addition: Warning message:
In if (leaveout != 0) { :
the condition has length > 1 and only the first element will be used
No, you name latent factors in the leaveout
argument. Those are just supposed to be any variable names or column numbers (i.e., observed variables in the data.frame
) in the facPlc
argument that should be ignored. You are using all the variables in facPlc
, so just don't use the leaveout
argument. Since you just use the default parcel names, you could omit the names
argument, too, but it should work fine. The rest of the syntax looks correct.
Ok, leaveout
makes sense to me know—I was thrown off by the wording (like many of the instructions), since I'm not truly randomly parceling those variables, but they are still part of the passed in model.
However, removing both names
and leaveout
still generates the Parcels incorrectly specified
error:
dTemp <- subset(d, select=c("ocap1rc", "ocap2rc", "ocap3rc", "ocap4rc", "ocap5rc", "ocap6rc", "ocap7rc", "ocap8rc", "ocap9rc", "health1", "health2", "health3", "health4", "health5", "health6", "health7", "health8", "health9", "NeedSatAut", "NeedSatComp", "NeedSatRel"))
syntax <- 'OCAP =~ V1 + V2 + V3
Health =~ V4 + V5 + V6
NeedSat =~ NeedSatAut + NeedSatComp + NeedSatRel
OCAP ~~ Health
OCAP ~~ NeedSat
Health ~~ NeedSat'
parcelAllocation(
nPerPar = list(c(3,3,3), c(3,3,3)),
facPlc = list(
c("ocap1rc", "ocap2rc", "ocap3rc", "ocap4rc", "ocap5rc", "ocap6rc", "ocap7rc", "ocap8rc", "ocap9rc"),
c("health1", "health2", "health3", "health4", "health5", "health6", "health7", "health8", "health9"),
c("NeedSatAut", "NeedSatComp", "NeedSatRel")
),
nAlloc = 10,
syntax = syntax,
dataset = dTemp
)
Error in parcelAllocation(nPerPar = list(c(3, 3, 3), c(3, 3, 3)), facPlc = list(c("ocap1rc", :
** WARNING! ** Parcels incorrectly specified. Check input!
facPlc
should not include common-factor names c("NeedSatAut", "NeedSatComp", "NeedSatRel")
because those cannot be parceled.
The help file for facPlc
specifies that it should be a "list of vectors, each corresponding to a factor, specifying the variables in that factor (whether included in parceling or not)," which led me to believe that I should include them there even though they're not being parceled. Additionally, when I remove that line I then get the error:
lavaan ERROR: missing observed variables in dataset: NeedSatAut NeedSatComp NeedSatRel
My mistake, I thought those were common-factor names, but the common factor is NeedSat, not those 3 observed indicators. In that case, I am out of advice. Sorry I am unfamiliar with this function, as I rarely find parceling appropriate in practice. To investigate any further, I would need your data so that I could see what happens within the function when it's running. You can send the data (please use saveRDS()
to save an individual object(s), not save()
to save a workspace) and relevant R syntax in a private email. Or you could do this on your own using debug(parcelAllocation)
so that you can step through the function yourself when you run parcelAllocation
. If you do, let me know what you find.
I think I've traced this bug down a bit. It seems what happens here is that if you have variables that are not to be included in parceling, you wind up with this test failing (line 174):
if (length(Locate) != length(Npp)) stop("Parcels incorrectly specified.",
" Check input!")
Additionally, Locate
and Npp
are defined as such:
Locate <- sort(round(facPlc))
for (i in 1:length(nPerPar)) Npp <- c(Npp, rep(i, nPerPar[i]))
So in my example above, Npp
and Locate
wind up being:
> Npp
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6
> Locate
[1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3
Which obviously do not match. However, even specifying a leaveout
doesn't seem to make them match.
Edit: If I specify an incorrect nPerPar
that contains non-parceled variables (which the instructions explicitly warn not to do), it runs:
dTemp <- subset(d, select=c("ocap1rc", "ocap2rc", "ocap3rc", "ocap4rc", "ocap5rc", "ocap6rc", "ocap7rc", "ocap8rc", "ocap9rc", "health1", "health2", "health3", "health4", "health5", "health6", "health7", "health8", "health9", "NeedSatAut", "NeedSatComp", "NeedSatRel"))
syntax <- 'OCAP =~ V1 + V2 + V3
Health =~ V4 + V5 + V6
NeedSat =~ V7 + V8 + V9
OCAP ~~ Health
OCAP ~~ NeedSat
Health ~~ NeedSat'
parcelAllocation(
nPerPar = list(c(3,3,3), c(3,3,3), c(1,1,1)),
facPlc = list(
c("ocap1rc", "ocap2rc", "ocap3rc", "ocap4rc", "ocap5rc", "ocap6rc", "ocap7rc", "ocap8rc", "ocap9rc"),
c("health1", "health2", "health3", "health4", "health5", "health6", "health7", "health8", "health9"),
c("NeedSatAut", "NeedSatComp", "NeedSatRel")
),
nAlloc = 10,
syntax = syntax,
dataset = dTemp
)
Note: This is a modification of my last example above.
So it seems that variables not intended to be included in parceling, even when specified in leaveout
, aren't properly accounted for.
Hi Cody,
I tried thinking about the problem a little differently, and perhaps the simplest solution is to treat the observed, nonparceled indicators c("NeedSatAut", "NeedSatComp", "NeedSatRel")
as though you are parceling them, but the parcels include only 1 indicator (i.e., the indicator is the parcel). I adapted the help-page example so that the second factor's 3 parcels are only made from 3 indicators (10:12) instead of 9 (10:18), and it works fine.
syntax <- 'La =~ V1 + V2 + V3
Lb =~ V4 + V5 + V6
'
name1 <- colnames(simParcel)[1:9]
name2 <- colnames(simParcel)[10:12]
parcelAllocation(list(c(3,3,3), c(1,1,1)), list(name1, name2), nAlloc = 20,
syntax = syntax, dataset = simParcel)
And that is what you found before you modified your comment above. I will think more about this example and what the leaveout
argument is for, and try to update the help page to take this information into account.
Thanks for tracking this down.
Hi Cody,
I am no longer in contact with the original primary author of this function, but after talking to the secondary author, there was some confusion about the original intent behind the leaveout
argument and its use. Because the original author's source code was very difficult to follow, I decided to simply rewrite the function to work more intuitively. Before I commit this change, I was hoping you could try it out, verify whether it works for your situation, and give me any feedback you have about the user experience, in case I can make it more intuitive.
The syntax below includes the new function and 4 examples, which I think correspond to the features you needed. Note that the arguments have changed. You are now simply required to provide 2 versions of your model: a model that uses parcels (the model
argument) and a version that uses the original items (the item.syntax
argument). By providing the names of indicators in model
that are parcels via the parcel.names
argument, the function will automatically compare your 2 models to automatically figure out which items should be part of the random parceling scheme. That is, if you have a combination of observed indicators and parcels in model
, it will infer that you only want to use indicators (of the same factor) from item.syntax
that do not appear in model
in the parceling scheme.
Thanks for your time, Terrence
parcelAllocation <- function(model, data, parcel.names, item.syntax,
nAlloc = 100, fun = "sem", alpha = .05,
fit.measures = c("chisq","df","cfi",
"tli","rmsea","srmr"), ...,
show.progress = FALSE) {
if (nAlloc < 2) stop("Minimum of two allocations required.")
if (!fun %in% c("sem","cfa","growth","lavaan"))
stop("'fun' argument must be either 'lavaan', 'cfa', 'sem', or 'growth'")
lavArgs <- list(...)
lavArgs$model <- item.syntax
lavArgs$data <- data
lavArgs$do.fit <- FALSE
## fit item-level model to data
item.fit <- do.call(fun, lavArgs)
item.PT <- lavaan::parTable(item.fit)
## construct parameter table for parcel-level model
if (is.character(model)) {
## default lavaanify arguments
ptArgs <- formals(lavaanify)
## arguments passed to lavaan by user
fitArgs <- lavaan::lavInspect(item.fit, "call")[-1]
## overwrite defaults with user's values
sameArgs <- intersect(names(ptArgs), names(fitArgs))
ptArgs[sameArgs] <- fitArgs[sameArgs]
ptArgs$model <- model
if (is.null(ptArgs$model.type)) ptArgs$model.type <- "sem"
if (ptArgs$model.type != "growth") ptArgs$model.type <- "sem"
ptArgs$ngroups <- lavaan::lavInspect(item.fit, "ngroups")
PT <- do.call("lavaanify", ptArgs)
} else if (is.data.frame(model)) {
PT <- model
} else stop("'model' argument must be a character string of lavaan model",
" syntax or a lavaan parameter table. See ?lavaanify help page.")
## check that both models specify the same factors
factorNames <- lavaan::lavNames(PT, type = "lv")
if (!all(sort(lavaan::lavNames(item.PT, type = "lv")) == sort(factorNames))) {
stop("'model' and 'item.syntax' arguments specify different factors.\n",
"'model' specifies: ", paste(sort(factorNames), collapse = ", "), "\n",
"'item.syntax' specifies: ", paste(sort(lavaan::lavNames(item.PT,
type = "lv")),
collapse = ", "))
}
## for each factor, assign item sets to parcel sets
assignments <- list()
for (i in factorNames) {
## all indicators from parcel-level model
parcels <- PT$rhs[PT$lhs == i & PT$op == "=~"]
## all indicators from item-level model
items <- item.PT$rhs[item.PT$lhs == i & item.PT$op == "=~"]
## exclude observed indicators from parceling scheme if specified
## in parcel-level model
assignments[[i]]$parcels <- setdiff(parcels, names(data))
assignments[[i]]$items <- setdiff(items, parcels)
## Does this factor have parcels? If not, omit this factor from next loop
if (length(assignments[[i]]$parcels) == 0L) {
factorNames <- factorNames[-which(factorNames == i)]
next
}
## how many items per parcel?
nItems <- length(assignments[[i]]$items)
nParcels <- length(assignments[[i]]$parcels)
assignments[[i]]$nPerParcel <- rep(nItems %/% nParcels, nParcels)
if (nItems %% nParcels > 0) for (j in 1:(nItems %% nParcels)) {
assignments[[i]]$nPerParcel[j] <- assignments[[i]]$nPerParcel[j] + 1
}
names(assignments[[i]]$nPerParcel) <- assignments[[i]]$parcels
}
## for each allocation, create parcels from items
dataList <- list()
for (i in 1:nAlloc) {
dataList[[i]] <- data
for (j in factorNames) {
## create a random assignment pattern
ranAss <- sample(rep(names(assignments[[j]]$nPerParcel),
times = assignments[[j]]$nPerParcel))
## add each parcel to a copy of the original data set
for (k in assignments[[j]]$parcels) {
## which items were selected for this parcel?
ranVars <- assignments[[j]]$items[ranAss == k]
## calculate row means of those items, save as parcel
dataList[[i]][ , k] <- rowMeans(data[ , ranVars])
}
}
}
## fit parcel-level model to list of data sets
fitList <- lavaanList(model, dataList, cmd = fun, ...,
FUN = lavaan::fitMeasures, show.progress = show.progress)
## for which data sets did the model converge?
conv <- fitList@meta$ok
if (!any(conv)) stop("The model did not converge for any allocations.")
if (!all(conv)) message("The model did not converge for the following ",
"allocations: ", paste(which(!conv), collapse = ", "))
## tools to extract output
getOutput <- function(x) {
c(Avg = mean(x), SD = sd(x), Min = min(x), Max = max(x))
}
out <- list()
myCols <- c("lhs","op","rhs","group", "block","label")
template <- data.frame(fitList@ParTableList[[which(conv)[1]]][myCols])
## parameter estimates
Est <- sapply(fitList@ParTableList[conv], function(x) x$est)
out$Estimates <- cbind(template, t(apply(Est, 1, getOutput)))
## standard errors
SE <- sapply(fitList@ParTableList[conv], function(x) x$se)
## Any for which SE could not be calculated?
missingSE <- apply(SE, 2, function(x) any(is.na(x)))
if (!all(missingSE)) {
if (any(missingSE)) message("Standard errors could not be computed for ",
"the following allocations: ",
paste(which(missingSE), collapse = ", "))
out$SE <- cbind(template, t(apply(SE[ , !missingSE], 1, getOutput)))
## add significance test results to $Estimates
Sig <- abs(Est[, !missingSE] / SE[, !missingSE]) > qnorm(alpha / 2,
lower.tail = FALSE)
out$Estimates[ , "Percent_Sig"] <- rowMeans(Sig)
out$Estimates[fitList@ParTableList[[which(conv)[1]]]$free == 0L, "Percent_Sig"] <- NA
} else {
message("Standard errors could not be calculated for any converged",
" data sets, so no significance tests could be conducted.")
out$SE <- NULL
}
## fit measures
Fit <- do.call(cbind, fitList@funList[conv])[fit.measures, ]
out$Fit <- data.frame(t(apply(Fit, 1, getOutput)))
## remove rows that do not correspond to estimates
out$Estimates <- out$Estimates[fitList@ParTableList[[which(conv)[1]]]$group > 0L, ]
if (!is.null(out$SE)) out$SE <- out$SE[fitList@ParTableList[[which(conv)[1]]]$group > 0L, ]
## assign class for lavaan's print method
class(out$Estimates) <- c("lavaan.data.frame","data.frame")
if (!is.null(out$SE)) class(out$SE) <- c("lavaan.data.frame","data.frame")
class(out$Fit) <- c("lavaan.data.frame","data.frame")
## return output
out
}
## New version requires lavaanList, so depends on lavaan version 0.6 or higher:
# install.packages("lavaan", repos="http://www.da.ugent.be", type="source")
library(lavaan)
simParcel <- semTools::simParcel
## item-level model (if NO parcels were created)
item.syntax <- c(paste0("f1 =~ f1item", 1:9),
paste0("f2 =~ f2item", 1:9))
cat(item.syntax, sep = "\n")
## 3-indicator parcels
mod.parcels <- '
f1 =~ par1 + par2 + par3
f2 =~ par4 + par5 + par6
'
## names of parcels
(parcel.names <- paste0("par", 1:6))
parcelAllocation(mod.parcels, data = simParcel, parcel.names, item.syntax,
nAlloc = 20, std.lv = TRUE, parallel = "snow", iseed = 12345)
## multigroup example
simParcel$group <- 0:1 # arbitrary groups for example
mod.mg <- '
f1 =~ par1 + c(L2, L2)*par2 + par3
f2 =~ par4 + par5 + par6
'
## names of parcels
(parcel.names <- paste0("par", 1:6))
set.seed(12345)
parcelAllocation(mod.mg, data = simParcel, parcel.names, item.syntax,
std.lv = TRUE, group = "group", group.equal = "loadings",
nAlloc = 20, show.progress = TRUE)
## parcels for first factor, items for second factor
mod.items <- '
f1 =~ par1 + par2 + par3
f2 =~ f2item2 + f2item7 + f2item8
'
## names of parcels
(parcel.names <- paste0("par", 1:3))
parcelAllocation(mod.items, data = simParcel, parcel.names, item.syntax,
nAlloc = 20, std.lv = TRUE)
## mixture of 1- and 3-indicator parcels for second factor
mod.mix <- '
f1 =~ par1 + par2 + par3
f2 =~ f2item2 + f2item7 + f2item8 + par4 + par5 + par6
'
## names of parcels
(parcel.names <- paste0("par", 1:6))
parcelAllocation(mod.mix, data = simParcel, parcel.names, item.syntax,
nAlloc = 20, std.lv = TRUE)
@TDJorgensen This looks great! I've only had a chance to try a couple variations of my models to see that they run, and will comb through the results soon to see if anything looks off, but at the moment this looks to do exactly what I'd hoped parcelAllocation() would do in the first place.
As far as comments, it is of course somewhat cumbersome to have two model specifications, but it is pretty clear what's going on in writing it that way, so at this point I don't think I have any change suggestions.
I greatly appreciate your time in troubleshooting this, and that you went as far as to rewrite this function. Thank you!
Great, good to hear. I probably won't have time to update semTools
again until my teaching load eases up in January, but I also plan to update PAVranking()
and poolMalloc()
as well. Please keep me updated about your other examples in case something else comes up in the meantime.
parcelAllocation()
has been updated, now available in the development version of semTools:
devtools::install_github("simsem/semTools/semTools")
I hope to update PAVranking()
and poolMalloc()
this week, as well.
Hello! I'm trying to use
parcelAllocation()
on a model with two parceled latent variables, and the rest regularly specified. If I set up a test model as per the example in the documentation, it works fine:However, if I add in a single latent factor with three indicators, even though I believe I'm getting the syntax correct, I get the error
** WARNING! ** Parcels incorrectly specified. Check input!
:However, it appears that I'm specifying all options correctly, so I don't know if this is a bug or if I'm misunderstanding, hence why I'm reporting it here. Any hints are welcomed!