Dear François Husson,
Thank you for all of the efforts to develop FactoMineR continuously. I am reading your book about the same package (Exploratory Multivariate Analysis by Example Using R) - second edition.
The discrepancy I found is on page 40. by issuing dimdesc(res.pca,proba=0.2) on the decathlon dataset. The output was:
Link between the variable and the categorical variable (1-way anova)
=============================================
R2 p.value
Competition 0.0511 0.1553
Link between variable abd the categories of the categorical variables
================================================================
Estimate p.value
Competition=OlympicG 0.502 0.1553
Competition=Decastar -0.502 0.1553
How do we calculate these estimates? I looked at the code of dimdesc() and condes() and I tried to do ANOVA to verify the output.
data(decathlon)
res.pca <- PCA(decathlon,quanti.sup=11:12,quali.sup=13)
decathlon$PC1 <- res.pca$ind$coord[, 1]
resAOV <- aov(PC1 ~ Competition, data = decathlon, na.action = na.exclude)
summary.lm(resAOV)
> summary.lm(resAOV)
Call:
aov(formula = PC1 ~ Competition, data = decathlon, na.action = na.exclude)
Residuals:
Min 1Q Median 3Q Max
-3.861 -1.098 -0.011 1.094 4.961
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.686 0.573 -1.20 0.24
CompetitionOlympicG 1.004 0.693 1.45 0.16
Residual standard error: 2.06 on 39 degrees of freedom
Multiple R-squared: 0.0511, Adjusted R-squared: 0.0268
F-statistic: 2.1 on 1 and 39 DF, p-value: 0.155
As you can see the p-value agrees with the dimdesc() output (0.155), however the estimate for OlympicG is (1.004) which is two times more than what dimdesc() function reported, why is that? Is there a statistical explanation to this or is it a bug in the function? Strangely enough, this estimate is even different from what was shown in the book (page 40) about FactoMineR using the same dataset.
Your help would be very much appreciated.
FactoMineR version: 2.4
R version: 4.2.1 (2022-06-23)
Dear François Husson, Thank you for all of the efforts to develop FactoMineR continuously. I am reading your book about the same package (Exploratory Multivariate Analysis by Example Using R) - second edition.
The discrepancy I found is on page 40. by issuing
dimdesc(res.pca,proba=0.2)
on the decathlon dataset. The output was:How do we calculate these estimates? I looked at the code of dimdesc() and condes() and I tried to do ANOVA to verify the output.
As you can see the p-value agrees with the dimdesc() output (0.155), however the estimate for OlympicG is (1.004) which is two times more than what dimdesc() function reported, why is that? Is there a statistical explanation to this or is it a bug in the function? Strangely enough, this estimate is even different from what was shown in the book (page 40) about FactoMineR using the same dataset.
Your help would be very much appreciated.
FactoMineR version: 2.4 R version: 4.2.1 (2022-06-23)
Note: I posted the same question on https://stats.stackexchange.com/q/588927/22518
Best regards, FFS