Additional control variables

AaronGullickson commented 1 year ago

Discussed in https://github.com/lightsociologist/hirsch_test/discussions/4

^{Originally posted by **lightsociologist** February 22, 2023} @AaronGullickson One idea that was embedded in my edits of your analysis script was the inclusion of two additional control variables - a control for institutional/university volume and a control for interdisciplinarity). The bibliometricians sought some additional controls when the paper was reviewed. There are obviously limits here and sociologists may have less of a concern. What do you think?

lightsociologist commented 1 year ago

@AaronGullickson So the two variables are sm_field_frac and unicount. The two issues are should we include them and how do we center if we do. The prior round of review did critique us for not including more controls - and we included these two - but also parsimony is good. I could go either way.

AaronGullickson commented 1 year ago

I would say we should at least do it as a sensitivity check.

AaronGullickson commented 1 year ago

what are they measuring exactly?

AaronGullickson commented 1 year ago

Started PR #9 for this. here is what I get at the moment:

=============================================================================================================
                                                               ANOVA Model     W-B Model       w/controls    
-------------------------------------------------------------------------------------------------------------
Intercept                                                           36.32 ***       43.82 ***       44.27 ***
                                                                    (0.92)          (1.25)          (1.31)   
Age (grand mean centered)                                                            0.38 ***        0.38 ***
                                                                                    (0.01)          (0.01)   
Probability [0-1] of female name (discipline mean centered)                         -2.47 ***       -2.66 ***
                                                                                    (0.22)          (0.21)   
Discipline mean percent female name                                                  0.08            0.08    
                                                                                    (0.05)          (0.05)   
Percent sole authored publications (discipline mean centered)                       -0.47 ***       -0.46 ***
                                                                                    (0.01)          (0.01)   
Discipline mean percent sole authored publications                                  -0.48 ***       -0.49 ***
                                                                                    (0.04)          (0.04)   
Specialization                                                                                      -2.81 ***
                                                                                                    (0.49)   
Unicount                                                                                             0.01 ***
                                                                                                    (0.00)   
-------------------------------------------------------------------------------------------------------------
AIC                                                             386205.33       380695.32       380195.16    
BIC                                                             386231.48       380765.03       380282.30    
Log Likelihood                                                 -193099.67      -190339.66      -190087.58    
Num. obs.                                                        44964           44964           44964       
Num. groups: discipline                                            174             174             174       
Var: discipline (Intercept)                                        142.49           62.53           63.06    
Var: Residual                                                      309.55          274.37          271.22    
=============================================================================================================
*** p < 0.001; ** p < 0.01; * p < 0.05

They both have important effects it seems, but do not change the other results at all.

Do we have better labels that can describe these variables? I don't know what they are measuring. We also probably want to rescale unicount.

lightsociologist commented 1 year ago

Unicount is a sum of the number of authors in the dataset from the author's university to control for highly productive workplaces. The other variable is basically % disciplinarity with lower scores indicating more work across fields.

lightsociologist commented 1 year ago

We should also center those variables too, right?

AaronGullickson commented 1 year ago

Yes, we should probably grand mean center them like age.

lightsociologist / hirsch_test

Additional control variables #8

Discussed in https://github.com/lightsociologist/hirsch_test/discussions/4