juba / explor

Interfaces for Multivariate Analysis in R
https://juba.github.io/explor/
193 stars 13 forks source link

Error message with explor() for CA #45

Closed gabrielparriaux closed 1 year ago

gabrielparriaux commented 1 year ago

Hello,

I perform a CA() (correspondance analysis) on a contingency table of 2 rows and 870 columns. Each column represents one token of my corpus, each row a modality of a variable. Cells contain the count of the token in documents that correspond to the modality of the given variable.

When I use CA() on my contingency table, I get no error message. The object of CA() appears in my Environment in RStudio, but no graph is produced even with option graph = TRUE.

When I execute explor() on the object containing the CA, I get the following message:

Error in `$<-.data.frame`(`*tmp*`, "Count", value = list()) : 
   replacement has 0 rows, data has 2

The strange thing is that when I perform the exact same process with the same data, but other variables with three or more modalities, I have no problem.

Do you have an idea of the origin of the problem?

Thanks a lot,

Gabriel

juba commented 1 year ago

Hi,

The issue comes from the fact that it doesn't really make sense to use explor when your data is of dimension 2 on one of its axes. In this case, the CA would only generate one dimension, so you couldn't plot it as a scatterplot in two dimensions. That's why no graph is produced by CA even with graph = TRUE.

I'have added a bit more explicit error message in the development version when trying to use explor on the results of a correspondance analysis with only one axis.

Thanks for taking the time to report the issue.

gabrielparriaux commented 1 year ago

Hi,

Thanks a lot for your answer! So I understand it’s a statistical misunderstanding on my side and not a problem with explor, so sorry! 😅

I was not aware of the fact that performing a CA with one variable of dimension 2 would generate only one dimension.

So do you know if it would be possible in some way to compute and show the relation between two variables in that case? Does something else than CA exist?

Thanks a lot for adding a more explicit message in explor for that specific case!

juba commented 1 year ago

To understand this maybe a bit better, the number of dimensions that can be generated by a correspondance analysis is determined by the smaller dimension of the data table. If one of these dimensions is of size 2, you only have two points, so only one axis can be computed, the one that goes through these two points.

In this simpler case, maybe you could just use a scatterplot ? The x and y axes would be the two values of your dimension of size 2, and each point would show the position of the values of the other dimension ?

gabrielparriaux commented 1 year ago

Thanks a lot for your explanation! I understand better. I will try with a scatterplot.