numpy / user-survey-2020-details

Analysis and publication of NumPy community survey results
https://numpy.org/user-survey-2020-details/
Other
2 stars 6 forks source link

Discrepancy in the analysis of Q4.9 #22

Closed InessaPawson closed 2 years ago

InessaPawson commented 3 years ago

Of the respondents who expressed interest in contributing to NumPy, most (75%) were interested in contributing to the source code and 47% expressed interest in developing education content or technical documentation. Source: https://rossbar.github.io/numpy-survey-results/content/contributions.html

This is my analysis for Q4.9 (see below). 29% expressed their interest in contributing to the source code and 36% – would like to help with educational content or technical documentation.

NumPySurvey_Q4 9
rossbar commented 3 years ago

I think the problem with the above analysis is that it doesn't account for multiple responses. Note that the sum of the counts (1589) is greater than the total number of survey respondents (1236). Instead, I divided by the number of people who responded "Yes" to the "would you be interested in contributing to NumPy" question, which was 608. That's where the numbers in the contributions page come from, e.g. 457 / 608 ~ 75%.

InessaPawson commented 3 years ago

Thank you for the explanation, @rossbar! I’ve reviewed the raw data as well. Your analysis is correct.

InessaPawson commented 3 years ago

I’m back at it.:)

... 47% expressed interest in developing education content or technical documentation.

Shouldn’t be 94%? Here are my calculations: Sample size - 608 Developing educational content - 297 Writing tech documentation - 277 297 + 277 = 574 ~ 94% of 608

rossbar commented 3 years ago

This one is tricky too. Many of the people who expressed an interest in one type of docs also expressed interest in the other, so they shouldn't be treated independently. I settled on an average of the two, but I agree that "developing educational content or technical documentation" might be a little misleading. Maybe the wording should be changed to:

47% expressed interest in developing educational content and/or technical documentation.

InessaPawson commented 3 years ago

I’m somewhat surprised that you decided to no longer differentiate educational content and technical documentation. Throughout the entire questionnaire, in the design of which you actively participated, these were two distinct categories, and the survey participants responded accordingly. For instance: Q3.4 In what way(s) have you contributed to these projects? Please select all that apply. Developing educational content & narrative documentation - 117 Writing technical documentation (e.g. docstrings, user guide, reference guide) - 175


Q4.2 In what capacity have you contributed to NumPy? Please select all that apply. Writing documentation - 42 Educational materials development - 13

Q4.9 In what ways would you be interested in contributing to NumPy? Please select all that apply.  Developing educational content & narrative documentation (e.g. tutorials) - 297 Writing technical documentation (e.g. docstrings, user guide, reference guide) - 277

If you believe that for the purpose of the survey educational content and technical documentation should be put in the same category, please submit your comment in the draft for the 2021 NumPy survey questionnaire: https://docs.google.com/document/d/1wCyuxll55ZjTR8RLbuGCB1VJkk5yjMMBdaP7glLitQ8. (There are quite a few suggestions already.) For the analysis of the 2020 survey, I’d be more comfortable treating them as two separate categories.

rossbar commented 3 years ago

I’m somewhat surprised that you decided to no longer differentiate educational content and technical documentation. Throughout the entire questionnaire, in the design of which you actively participated, these were two distinct categories, and the survey participants responded accordingly.

Oh no, I totally agree that they are meaningful distinctions! The reason that I chose this wording for the summary in the text is because I had not taken into account indivdual responses, but was only summarizing the number of times a given response appeared. The former is more involved, though more informative. I will add it!

rossbar commented 3 years ago

Okay, I've added the analysis in #26. The new, more detailed results can be previewed here

rossbar commented 2 years ago

This was addressed in #26