Open oldoc63 opened 1 year ago
These proportions can then be converted to frequencies by multiplying each one by the sample size (11097 for this data):
leader = no | leader = yes | |
---|---|---|
influence = no | 0.188*11097 = 2087 | 0.200*11097 = 2221 |
influence = yes | 0.296*11097 = 3288 | 0.315*11097 = 3501 |
This table tells us that if there were no association between the leader and influence questions, we would expect 2087 people to answer no to both.
In python, we can calculate this table using the chi2_contingency() function from SciPy, by passing in the observed frequency table. There are actually four outputs from this function, but for now, we'll only look at the fourth one:
Note that the SciPy function returned the same expected frequencies as we calculated "by hand" above! Now that we have the expected contingency table if there's no association, we can compare it to our observed contingency table. Use np.round() to print out the expected contingency table, with values rounded to the nearest whole number. Compare this to the observed frequency table. How much do the numbers in these tables differ?
The more that the expected and observed tables differ, the more sure we can be that the variables are associated. In this example, we see some pretty big differences (e.g., 3015 in the observed table compared to 2087 in the expected table). This provides additional evidence that these variables are associated.
The contingency table of frequencies for the special and authority questions is saved in the special_authority_freq variable. Use the chi2_contingency() function to calculate the expected frequency table for these two questions if there were no association. Save the result as expected.
np.round() was used to print out the expected contingency table, with values rounded to the nearest whole number. Compare this to the observed frequency table. How much do the numbers in these tables differ?
We calculated the marginal proportions for the leader and influence questions. In order to understand whether these questions are associated, we can use the marginal proportions to create a contingency table of expected proportions if there were no association between these variables. To calculate these expected proportions, we need to multiply the marginal proportions for each combination of categories: