welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Add gender as metadata to MP speeches #27

Closed fredrik1984 closed 3 years ago

fredrik1984 commented 3 years ago

It should be possible to sort MP speeches by gender (as well as year, party belonging and chamber). I guess this will also require that we have the MP person names to derive gender from some kind of name index?

ninpnin commented 3 years ago

Yes, this can be done by simply adding gender to the MP database. You would indeed need a list of Swedish names by gender, this is what I found https://barnnamn.net/flicknamn-a-o/ https://barnnamn.net/pojknamn-a-o/ ... Better sources are appreciated!

fredrik1984 commented 3 years ago

Excellent source haha! But maybe something to begin with

MansMeg commented 3 years ago

I agree. I think we can use those list:

  1. Remove names that occur in both list (ambigious)
  2. Map with the MEPs first names.
  3. Those that are not mapped, just send us a list of the MEPs and we will fix it manually
fredrik1984 commented 3 years ago

Sounds like a good plan to me

ninpnin commented 3 years ago

https://github.com/welfare-state-analytics/riksdagen-corpus/blob/dev/db/mp/metadata/names.csv

MansMeg commented 3 years ago

Cool! BTW we should rename the MP csv to members_of_parliament.csv instead since in the long run, we want to expand this corpus to other time frames.

ninpnin commented 3 years ago

I was able to infer the gender of 8612/10556 MPs with this method.

Some of the misses were due to a rare/old-fashioned/foreign name, but there were also some MPs with mere initials (C. O. Larsson i Sjötorp etc.) in the original Wikipedia lists.

fredrik1984 commented 3 years ago

Good job! I also guess that the method missed "double names", like Sven-Bertil Anna-Maria etc? And surely MPs with names not don't have a typical Swedish origin

ninpnin commented 3 years ago

Here's the MP database with currently available gender information: https://github.com/welfare-state-analytics/riksdagen-corpus/blob/dev/db/mp/members_of_parliament.csv

MansMeg commented 3 years ago

Great! Could you send me an Fredrik a list with the names without somewhere we can fix the last ones?

MansMeg commented 3 years ago

Has this been done by Felix now?

ninpnin commented 3 years ago

Yes, but not merged yet. Status in welfare-state-analytics/riksdagen-corpus-old#66

MansMeg commented 3 years ago

Ah! Great!