melff / mclogit

mclogit: Multinomial Logit Models, with or without Random Effects or Overdispersion
http://melff.github.io/mclogit/
22 stars 4 forks source link

[Question] Interpreting and plotting a mblogit model #29

Closed Jueun0505 closed 1 year ago

Jueun0505 commented 1 year ago

Dear Professor Elff,

First of all, thank you very much for the package you published which is a perfect tool to analyze my data! I am currently working on my thesis concerning emotional speech in different languages. Specifically, the goal of the study is to confirm which acoustic parameter is significantly related to each emotion. Very below of this post are the models I built solar for English and Korean data.

My questions are as follows: 1) Since the reference level for the English model is 'ang', 'hap' and 'sad' were respectively compared as shown in the result. Is there a way to know the relationship between 'hap' and 'sad' in this case? When I built another model having 'hap' as a reference level, exp(coef(model)) was different from that of the model where 'ang' was assigned as a reference level.

2) (Co)variance: Is it normal to have a large (co)variance value? Unlike the English model, the Korean model showed high numbers of (Co)variance as below (The model is not converged). Does it imply that the random effects are innegligible? What do the numbers exactly mean?

3) Is it possible to plot the model reflecting the mixed effect? If so, could you please provide me some guidance for it? (I have tried with plot(eng_m2) and it did not work...)

4) How can I know which model fits the best? I saw at a book that anova is not recommended for this package, and you also mentioned somewhere here that bic would not be good either to trust.

5) Lastly, as I am relatively a beginner in statistics, I am not sure whether I have to check multicollinearity for my models. When I tries, R gives warning message when I entered 'vif(eng_m2) as 'Warning message:In vif.default(eng_m2) : No intercept: vifs may not be sensible.'.

Thank you so much for reading the questions! Best regards, Jueun Kang

English model

> str(eng_scaled)
tibble [18,231 × 11] (S3: tbl_df/tbl/data.frame)
 $ lang     : Factor w/ 1 level "eng": 1 1 1 1 1 1 1 1 1 1 ...
 $ gender   : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 1 ...
 $ emotion  : Factor w/ 3 levels "ang","hap","sad": 1 1 1 1 1 1 1 1 1 1 ...
 $ speaker  : Factor w/ 10 levels "eng1","eng10",..: 7 7 7 7 7 7 7 7 7 7 ...
 $ phone    : Factor w/ 3 levels "a","i","u": 3 3 3 3 3 3 3 3 3 3 ...
 $ F0       : num [1:18231] 1.22 0.88 1.71 0.49 0.65 1.05 1.92 1.44 1.64 0.68 ...
 $ F1       : num [1:18231] 0.45 0.23 -0.82 -0.01 -1.02 -1.08 -0.82 -0.71 -0.73 0.1 ...
 $ F2       : num [1:18231] -1.15 -1.51 -1.53 -1.56 0.62 -1.87 -1.78 -0.19 -1.55 -1.41 ...
 $ duration : num [1:18231] -0.16 -0.43 -0.43 -0.84 -1.25 -0.43 -0.16 -1.39 -1.12 -0.98 ...
 $ intensity: num [1:18231] 1.22 0.02 0.62 -0.58 -0.87 0.62 2.56 1.07 1.37 1.52 ...
 $ energy   : num [1:18231] 0.4 -0.23 -0.29 -0.55 -0.61 0.01 5.19 -0.39 -0.03 0.25 ...

> summary(eng_m2)
Call:
mblogit(formula = emotion ~ F0 + F1 + F2 + duration + intensity + 
    energy, data = eng_scaled, random = list(~1 | speaker, ~1 | 
    gender, ~1 | phone))

Equation for hap vs ang:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.15282    0.33990  -0.450    0.653    
F0           0.76615    0.02707  28.297  < 2e-16 ***
F1          -0.25688    0.03020  -8.506  < 2e-16 ***
F2          -0.16723    0.03202  -5.222 1.77e-07 ***
duration     0.27422    0.02360  11.620  < 2e-16 ***
intensity   -0.17466    0.02942  -5.937 2.91e-09 ***
energy      -0.58378    0.03440 -16.971  < 2e-16 ***

Equation for sad vs ang:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.20731    0.44908  -0.462   0.6443    
F0          -1.09574    0.03727 -29.402  < 2e-16 ***
F1           0.05779    0.02996   1.929   0.0538 .  
F2           0.22005    0.03359   6.551 5.71e-11 ***
duration     0.21687    0.02359   9.192  < 2e-16 ***
intensity    0.25189    0.03087   8.160 3.36e-16 ***
energy      -0.64147    0.04042 -15.871  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Co-)Variances:
Grouping level: speaker 
      Estimate            Std.Err.         
hap~1  0.14634            0.003384         
sad~1 -0.09344  0.20440   0.004423 0.006064

>Grouping level: gender 
      Estimate            Std.Err.         
hap~1  0.16080            0.007254         
sad~1 -0.07084  0.27069   0.011991 0.023856

Grouping level: phone 
      Estimate              Std.Err.           
hap~1 0.0594145             0.0001714          
sad~1 0.0002151 0.1353236   0.0004175 0.0020331

Null Deviance:     40060 
Residual Deviance: 35200 
Number of Fisher Scoring iterations:  6
Number of observations
  Groups by speaker: 10
  Groups by gender: 2
  Groups by phone: 3
  Individual observations:  18231

Korean model

> summary(kor_m1)

Call:
mblogit(formula = emotion ~ F0 + F1 + F2 + duration + intensity + 
    energy, data = kor_scaled, random = list(~1 | speaker, ~1 | 
    gender, ~1 | phone))

Equation for hap vs ang:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.69225    4.00926   0.422  0.67296    
F0           0.77587    0.13249   5.856 4.74e-09 ***
F1          -0.66141    0.10112  -6.541 6.12e-11 ***
F2          -0.65357    0.15319  -4.266 1.99e-05 ***
duration     0.15896    0.09844   1.615  0.10637    
intensity   -0.26378    0.09987  -2.641  0.00826 ** 
energy      -0.25351    0.13002  -1.950  0.05121 .  

Equation for sad vs ang:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.0763     6.3832  -0.169 0.866098    
F0           -0.4551     0.2555  -1.781 0.074872 .  
F1           -0.9775     0.3521  -2.776 0.005498 ** 
F2           -0.2067     0.2924  -0.707 0.479654    
duration      0.1314     0.2177   0.604 0.546162    
intensity    -0.9113     0.2696  -3.380 0.000725 ***
energy        0.3450     0.2803   1.231 0.218363    
---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Co-)Variances:
Grouping level: speaker 
      Estimate        Std.Err.   
hap~1 105.71          142.5      
sad~1  90.06 222.87   195.8 370.3

Grouping level: gender 
      Estimate        Std.Err.   
hap~1  7.118          16.77      
sad~1 -6.603  6.818   16.44 16.06

Grouping level: phone 
      Estimate      Std.Err.   
hap~1 18.73          61.3      
sad~1 28.08 74.67   124.7 288.6

Null Deviance:     31510 
Residual Deviance: 1594 
Number of Fisher Scoring iterations:  25
Number of observations
  Groups by speaker: 20
  Groups by gender: 2
  Groups by phone: 3
  Individual observations:  14339
Note: Algorithm did not converge.