humlab / the_culture_of_international_relations

Repository for NLP-scripts related to the The Culture of International Relations project
2 stars 3 forks source link

Check percentage calculation (in jupyter, Treaty quantities by topic) #29

Closed benjamingmartin closed 4 years ago

benjamingmartin commented 4 years ago

In jupyter, 1_quantitative_analysis_of_WTI, Treaty quantities by topic:

If you click on “add OTHER topics” and on "show percentage", seems to calculate % share of the total... but I see that OTHER includes everything except the selected treaty topic. So it calculates percentage not by finding X/total, but rather X/(total-X). And that will give a wrong value, no?

You can see my saved example under 4_publications, 1_analysis_Rise...2020, Figure 3.

Please explain (and, if necessary, adjust)!

roger-mahler commented 4 years ago

I'm not entirely certain I understand what result is not as expected expected. The "Show percentage" button should only normalize the values for each year i.e the some of all topic categories should be 1.0 for each year. The calculation for topic category T is number of treaties in category T / the total number of treaties.

The following code fetches the data for your setup:

Given this setup code (click to expand) ```python import os import pandas as pd from IPython.display import display, HTML os.sys.path = os.sys.path if '..' in os.sys.path else os.sys.path + ['..'] import common.config as config import common.treaty_state as treaty_repository import analysis_data ```

The pivot tables used in the plots, with and without normalization, can be computed as follows:

wti_index = treaty_repository.load_wti_index(config.DATA_FOLDER)
data = analysis_data.QuantityByTopic.get_treaty_topic_quantity_stat(
    wti_index                 = wti_index,
    period_group              = config.DEFAULT_PERIOD_GROUPS[4],
    topic_category            = config.TOPIC_GROUP_MAPS['7CORR'],
    party_group               = { 'label': 'ALL', 'parties': None },
    recode_is_cultural        = True,
    extra_other_category      = True,
    target_quantity           = 'topic'
)

pivot = pd.pivot_table(data, index=['Period'], values=["Count"], columns=['Category'], fill_value=0)
pivot.columns = [ x[-1] for x in pivot.columns ]

normalized_pivot = pivot.div(0.01 * pivot.sum(1), axis=0)
Result in absolute numbers (click to open) | Period | 7CORR | OTHER | |--------|-------|------| | 1935 | 7 | 258 | | 1936 | 7 | 241 | | 1937 | 8 | 293 | | 1938 | 7 | 223 | | 1939 | 5 | 196 | | 1940 | 1 | 118 | | 1941 | 7 | 117 | | 1942 | 6 | 163 | | 1943 | 4 | 117 | | 1944 | 5 | 94 | | 1945 | 4 | 238 | | 1946 | 7 | 491 | | 1947 | 15 | 545 | | 1948 | 14 | 588 | | 1949 | 8 | 563 | | 1950 | 12 | 589 | | 1951 | 15 | 665 | | 1952 | 13 | 669 | | 1953 | 20 | 630 | | 1954 | 19 | 709 | | 1955 | 20 | 843 | | 1956 | 51 | 882 | | 1957 | 56 | 848 | | 1958 | 40 | 897 | | 1959 | 50 | 916 | | 1960 | 54 | 1111 | | 1961 | 66 | 1128 | | 1962 | 28 | 1012 | | 1963 | 38 | 1015 | | 1964 | 57 | 1065 | | 1965 | 53 | 1003 | | 1966 | 80 | 1152 | | 1967 | 60 | 1167 | | 1968 | 50 | 1227 | | 1969 | 51 | 1166 | | 1970 | 53 | 1287 | | 1971 | 51 | 1386 | | 1972 | 46 | 1430 |
Result in percentages (click to open) | Period | 7CORR | OTHER | |--------|----------|-----------| | 1935 | 2.641509 | 97.358491 | | 1936 | 2.822581 | 97.177419 | | 1937 | 2.657807 | 97.342193 | | 1938 | 3.043478 | 96.956522 | | 1939 | 2.487562 | 97.512438 | | 1940 | 0.840336 | 99.159664 | | 1941 | 5.645161 | 94.354839 | | 1942 | 3.550296 | 96.449704 | | 1943 | 3.305785 | 96.694215 | | 1944 | 5.050505 | 94.949495 | | 1945 | 1.652893 | 98.347107 | | 1946 | 1.405622 | 98.594378 | | 1947 | 2.678571 | 97.321429 | | 1948 | 2.325581 | 97.674419 | | 1949 | 1.401051 | 98.598949 | | 1950 | 1.996672 | 98.003328 | | 1951 | 2.205882 | 97.794118 | | 1952 | 1.906158 | 98.093842 | | 1953 | 3.076923 | 96.923077 | | 1954 | 2.609890 | 97.390110 | | 1955 | 2.317497 | 97.682503 | | 1956 | 5.466238 | 94.533762 | | 1957 | 6.194690 | 93.805310 | | 1958 | 4.268943 | 95.731057 | | 1959 | 5.175983 | 94.824017 | | 1960 | 4.635193 | 95.364807 | | 1961 | 5.527638 | 94.472362 | | 1962 | 2.692308 | 97.307692 | | 1963 | 3.608737 | 96.391263 | | 1964 | 5.080214 | 94.919786 | | 1965 | 5.018939 | 94.981061 | | 1966 | 6.493506 | 93.506494 | | 1967 | 4.889976 | 95.110024 | | 1968 | 3.915427 | 96.084573 | | 1969 | 4.190633 | 95.809367 | | 1970 | 3.955224 | 96.044776 | | 1971 | 3.549061 | 96.450939 | | 1972 | 3.116531 | 96.883469 |

Year 1935:

Period 7CORR OTHER
1935 7 258
Period 7CORR OTHER
1935 2.641509 = 100.0 * 7.0 / (7.0 + 258.0) 97.358491 = 100.0 * 258.0 / (7.0 + 258.0)
benjamingmartin commented 4 years ago

OK, thanks, @roger-mahler . I did not see that the percentage calculation did divide the topic quantity in question by the entire sample (like 7+258, above).

So, can you make it so that I can plug this simple pivot-creating code into my jupyter page? I have tried, with the set up and then the instructions, but it just tells me that it has loaded the WTI (again) without producing any output.
I'd like to be able to run it there (and get tables like the ones you included above) also because the numbers above are from "WTI old," not from 7CULT+, which is the sample I want to use. Tack!

roger-mahler commented 4 years ago

@benjamingmartin: I have added the code to create the pivot table below the line Below, we see the percentage data in table format. UNDER CONSTRUCTION!.

I used your notebook to review the code to be sure it worked. Sorry if I didn't remove all of my test code.The cells above the inserted code seems to be duplicated. I might also have changed the display to table. If so, you can revert back by changing 'chart_type_name' to 'plot_stacked_bar',.

Note that you select what 'is_cultural' definition to use in the top-most cell.

benjamingmartin commented 4 years ago

Hej @roger-mahler , toppen, tack! Har nu skrivit det jag ville om detta. Och tror nog att ingen graf eller tabell kanske behövs i själva artikeln, men det är fint att ha den med i jupyter, som redovisning. Tar och stänger detta, alltså. Trevlig helg!