New calculation tools - Githubissues

humlab / the_culture_of_international_relations

Repository for NLP-scripts related to the The Culture of International Relations project

2 stars 3 forks source link

New calculation tools #33

Closed benjamingmartin closed 4 years ago

benjamingmartin commented 4 years ago

a) Vilka länder skrev under flest cultural treaties (i kategorien 7CULT+, och antingen som party 1 eller party 2) under hela perioden 1935-1972? Vore fint att kunna ta fram en tabell, bara rå data, så att man kan se de Top 5, eller top 10 länder. (Inte år per år, alltså, utan hela perioden som en enda klump.)

b) Samma sak (en enkel tabell) över hur många vissa utvalda länder skrev under.

roger-mahler commented 4 years ago

import pandas as pd

treaties = treaty_repository.current_wti_index().stacked_treaties
treaties['signed_year'] = pd.DatetimeIndex(treaties['signed']).year

treaties = treaties[treaties.is_cultural_yesno_plus == 'yes']
treaties = treaties[treaties.source.isin(['LTS', 'UNTS', 'UNXX'])]
treaties = treaties[(1935 <= treaties.signed_year) & (treaties.signed_year <= 1972)]
treaties = treaties[treaties.english=='en']

treaties = treaties\
    .groupby(['party_short_name'])['party_other_name']\
    .agg(['count', 'nunique'])\
    .reset_index()\
    .rename(columns={ 'party_short_name': 'Party', 'count': 'Count', 'nunique': 'Unique' })\
    .sort_values(by='Count', ascending=False)

treaties.to_excel('number_of_treaties_per_party.xlsx')

number_of_treaties_per_party.xlsx

benjamingmartin commented 4 years ago

Toppen, tack! One more question (@roger-mahler): where is the list of "party_short_name" that the tool now uses? I need to change some of these (especially "Germany", which now pulls together Nazi Germany, East Germany, West Germany, and post-1990 Germany!).

benjamingmartin commented 4 years ago

Hej @roger-mahler and @aibakeneko ! Some follow-up questions on this issue:

Where can I find the list of "party_short_name" that the tool now uses? I need to change some of these (especially "Germany", which now pulls together Nazi Germany, East Germany, West Germany, and post-1990 Germany!).
Where does the jupyter page get the WTI index from? I have made a small update (added one new agreement to 7CULT+), but this does not show up in the count when I load the WTI index.

roger-mahler commented 4 years ago

The WTI index and the country meta data are read from the data folder in the project root folder.

To change the country meta data you need to upload a new version of the parties_curated.xlsx Excel file. You also need to remove parties_curated_parties.csv, and parties_curated_continent.csv and parties_curated_group.csv if you change the corresponding sheets sheets in the Excel. You can safely remove all three CSV files - the system will create new ones as long as the Excel file is found.
You need to replace the Treaties_Master_List.xlsx in this folder and remove the file Treaties_Master_List_Treaties.csv in the same folder, and for the same reason.

Reading Excel files are somewhat slow in Python so the system exports Excel-sheets to a CSV files which is much faster but only if it doesn't already exists. I haven't implement a timestamp check, so unless you remove the CSV file, the system will continue to load the old index.

benjamingmartin commented 4 years ago

Excellent. Closing this issue.