husson / FactoMineR

Package FactoMineR
42 stars 8 forks source link

calculation discrepency of the quali.sup estimates by dimdesc() #12

Closed ffsammak closed 1 year ago

ffsammak commented 2 years ago

Dear François Husson, Thank you for all of the efforts to develop FactoMineR continuously. I am reading your book about the same package (Exploratory Multivariate Analysis by Example Using R) - second edition.
The discrepancy I found is on page 40. by issuing dimdesc(res.pca,proba=0.2) on the decathlon dataset. The output was:

Link between the variable and the categorical variable (1-way anova)
=============================================
                           R2      p.value
Competition 0.0511  0.1553

Link between variable abd the categories of the categorical variables
================================================================
                                           Estimate  p.value
Competition=OlympicG    0.502     0.1553
Competition=Decastar    -0.502     0.1553

How do we calculate these estimates? I looked at the code of dimdesc() and condes() and I tried to do ANOVA to verify the output.

data(decathlon)
res.pca <- PCA(decathlon,quanti.sup=11:12,quali.sup=13)
decathlon$PC1 <- res.pca$ind$coord[, 1]

resAOV <- aov(PC1 ~ Competition, data = decathlon, na.action = na.exclude)
summary.lm(resAOV)

> summary.lm(resAOV)

Call:
aov(formula = PC1 ~ Competition, data = decathlon, na.action = na.exclude)

Residuals:
   Min     1Q Median     3Q    Max 
-3.861 -1.098 -0.011  1.094  4.961 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)
(Intercept)           -0.686      0.573   -1.20     0.24
CompetitionOlympicG    1.004      0.693    1.45     0.16

Residual standard error: 2.06 on 39 degrees of freedom
Multiple R-squared:  0.0511,    Adjusted R-squared:  0.0268 
F-statistic:  2.1 on 1 and 39 DF,  p-value: 0.155

As you can see the p-value agrees with the dimdesc() output (0.155), however the estimate for OlympicG is (1.004) which is two times more than what dimdesc() function reported, why is that? Is there a statistical explanation to this or is it a bug in the function? Strangely enough, this estimate is even different from what was shown in the book (page 40) about FactoMineR using the same dataset.

Your help would be very much appreciated.

FactoMineR version: 2.4 R version: 4.2.1 (2022-06-23)

Note: I posted the same question on https://stats.stackexchange.com/q/588927/22518

Best regards, FFS

ffsammak commented 1 year ago

I edited the above report to reflect the problem more accurately.