Mechanism linking higher semantic diversity to lower article quality?

Thank you for presenting!

Regarding the structural model estimates, your explanation that polarization decreases semantic diversity by focusing the debate on a narrow range of the most hotly contested topics seems plausible to me/ agrees with my intution, so I am not surprised to see a negative coefficient estimate there.

However, I was surprised to see a negative coefficient for the effect of semantic diversity on overall quality. My (perhaps naive) expectation would be that an article with a larger number of issues being discussed in the talk page would likely be of higher quality since it examines more aspects of the topic. (Although perhaps there is a less charitable explanation in that the semantic diversity measure also picks up having multiple copyediting mistakes or similar non-ideological issues?)

As a resolution for this, perhaps Wikipedia’s learning model is “punishing” the quality score of articles with high semantic diversity because it thinks those articles should be broken up into smaller sub-articles? (If that is the reason, it seems conceivable -though unlikely- that the same estimation using a different measure of quality could have a different (positive) estimated sign, leading to a channel where polarization decreases quality - though the overall effect of polarization on quality would still be positive after considering all channels.)

To state my question more concisely: is it surprising to you that higher semantic diversity is associated with lower article quality? Why or why not?

uchicago-computation-workshop / james_evans

Mechanism linking higher semantic diversity to lower article quality? #26