Closed AaronGullickson closed 1 year ago
@AaronGullickson So the two variables are sm_field_frac and unicount. The two issues are should we include them and how do we center if we do. The prior round of review did critique us for not including more controls - and we included these two - but also parsimony is good. I could go either way.
I would say we should at least do it as a sensitivity check.
what are they measuring exactly?
Started PR #9 for this. here is what I get at the moment:
=============================================================================================================
ANOVA Model W-B Model w/controls
-------------------------------------------------------------------------------------------------------------
Intercept 36.32 *** 43.82 *** 44.27 ***
(0.92) (1.25) (1.31)
Age (grand mean centered) 0.38 *** 0.38 ***
(0.01) (0.01)
Probability [0-1] of female name (discipline mean centered) -2.47 *** -2.66 ***
(0.22) (0.21)
Discipline mean percent female name 0.08 0.08
(0.05) (0.05)
Percent sole authored publications (discipline mean centered) -0.47 *** -0.46 ***
(0.01) (0.01)
Discipline mean percent sole authored publications -0.48 *** -0.49 ***
(0.04) (0.04)
Specialization -2.81 ***
(0.49)
Unicount 0.01 ***
(0.00)
-------------------------------------------------------------------------------------------------------------
AIC 386205.33 380695.32 380195.16
BIC 386231.48 380765.03 380282.30
Log Likelihood -193099.67 -190339.66 -190087.58
Num. obs. 44964 44964 44964
Num. groups: discipline 174 174 174
Var: discipline (Intercept) 142.49 62.53 63.06
Var: Residual 309.55 274.37 271.22
=============================================================================================================
*** p < 0.001; ** p < 0.01; * p < 0.05
They both have important effects it seems, but do not change the other results at all.
Do we have better labels that can describe these variables? I don't know what they are measuring. We also probably want to rescale unicount.
Unicount is a sum of the number of authors in the dataset from the author's university to control for highly productive workplaces. The other variable is basically % disciplinarity with lower scores indicating more work across fields.
We should also center those variables too, right?
Yes, we should probably grand mean center them like age.
Discussed in https://github.com/lightsociologist/hirsch_test/discussions/4