knickodem / kfa

k-fold cross validation for factor analysis
GNU General Public License v3.0
7 stars 1 forks source link

Cross-loading KFA #8

Closed avrvaidya closed 2 years ago

avrvaidya commented 2 years ago

Hi Kyle, Thanks for making this awesome R package! This is perfect for a new project I am starting at the moment. I have just been playing around with the example code, and was was wondering if there was a way to allow for cross-loading of items onto multiple factors in the k-fold factor analysis? Thanks, Avi

knickodem commented 2 years ago

Hi Avi,

Thanks! We are excited to get the package off the ground.

If you have a specific cross-loadings you want to examine, you can supply syntax to the custom.cfa argument. For instance, we could modify the README example to have Item 10 load onto both factors:

custom2f <- paste0("f1 =~ ", paste(colnames(sim.data)[1:10], collapse = " + "),
                   "\nf2 =~ ",paste(colnames(sim.data)[10:20], collapse = " + "))
mods <- kfa(variables = sim.data,
            custom.cfas = custom2f)

We do not currently have a procedure for identifying and including cross-loadings from the EFA portion of the k-fold cross-validation, but it is something we have discussed. To help us formulate our approach, can you share why you are interested in cross-loadings and what including a cross-loading item means for your work? Additionally, what criteria would you use for keeping a cross-loading item?

In the meantime, you can also use the run_efa function (this does not use the k-fold procedure) with simple = FALSE and specifying the threshold argument, then examine the returned loadings to get a sense of which cross-loadings might be of interest.

Thanks, Kyle

avrvaidya commented 2 years ago

Thanks for your quick reply! The custom argument does help solve my problem somewhat.

What I mean by cross-loading is just a measure loading onto multiple factors. The reason why I am interested in this is because I can anticipate a priori that this is going to be a feature in my data, though I cannot be very sure about what these cross-loadings will look like (and that's why the custom argument doesn't fully solve the problem).

My work is in cognitive psychology, and I am interested in studying the factor structure of measures from a set of behavioral experiments (e.g. remembering a sequence of numbers and repeating them backwards, quickly pressing a key in response to an arbitrary symbol). We know from lots of research in this field that performance on these kinds of tests generally load onto a common factor that captures the ability of people to carry out these kinds of tasks (so-called g or 'general intelligence'). However, these same tasks measures are also generally better described by models that include additional factors for more specific cognitive processes that occur in multiple tasks (e.g. 'response inhibition'). So, it is not uncommon to have a model where a given task measure might load onto a more generic general intelligence factor that many other measures load onto, as well as some other more specific factors that capture a unique process.

As for the criteria to use in keeping a cross-loading item, my plan would be to let the threshold depend on the amount of power you have in their sample to reliably detect an effect. I'd definitely be interested in hearing your thoughts on this as well!

knickodem commented 2 years ago

Thanks for sharing, Avi. Do you often use bifactor models? Accommodating bifactor models is another addition we are considering.

To get a better understanding of your approach to cross-loadings, let's say Item 1 has factor loadings of .7 and .6 on Factors 1 and 2, respectively. Item 2 has factor loadings of .9 and .6. When deciding which cross-loading to keep, are you more concerned about the magnitude of each factor loading or the difference between the factor loadings? In other words, which threshold are you referring to? This is part of our hesitancy to program cross-loader detection because it is not clear what decisions need to made. In either case, we do not currently have a power analysis procedure for detecting a specific parameter, although we could incorporate one. Could you walk me through an example of your decision-making process and how the power analysis would be used?

avrvaidya commented 2 years ago

I am actually fairly new to factor analysis and these problems, so I don't have a lot of practical experience dealing with this issue. However, from what I have read, a bifactor model seems like it would provide a pretty good solution to my problem.

What I was thinking is that you could set a threshold for including a factor after EFA based on a simple Pearson correlation or based on a user-chosen a priori threshold. For example, if the sample is large enough to detect a r = 0.4 with an alpha of 0.05 and beta of 0.8, then you'd retain any loading above that threshold in the model tested with CFA. So, in the case of your example, you would retain the loading of item 1 and 2 onto both factors if there is enough power to reliably detect a loading of 0.6. I suppose this could become problematic if you have really large datasets where you are powered to detect even small relationships that aren't very meaningful. In that case, it might be more useful to set a stricter threshold.

This is just my kind of naive idea though, I'd like to know what you guys who know these models inside-out think.

knickodem commented 2 years ago

Hi Avi, sorry for the delay in getting back to you.

I'm not quite sure which correlation are referring to. The observed bivariate correlation between two items will not provide any indication of the need for a cross-loading. Two items may have a high correlation because they are measuring the same factor. Ideally, the common factor explains most of the variation in both items, resulting in a low residual correlation. A high residual correlation indicates a possible need for an additional factor. However, identifying a large residual correlation in a 2-factor model to retain a cross-loading in the subsequent 3-factor model, for instance, would be technically tricky and I don't think has any theoretical backing.

Setting technical issues aside, I am first interested in how researchers view the impact of cross-loadings on the interpretation of factors or the correlation between the factors. I'm of the opinion that cross-loading items muddle the interpretation, which can lead to jingle-jangle fallacies, but I'm open to hearing arguments to the contrary.

knickodem commented 2 years ago

Hi Avi, With today's release of kfa 0.2.0, you can now allow cross-loaders in kfa() by setting simple = FALSE and specifying the threshold argument as the minimum loading to retain an item with a factor.