jsoma / data-studio-projects

12 stars 18 forks source link

Languages in Russia used by 10 people or fewer #136

Open castorsia opened 6 years ago

castorsia commented 6 years ago

Please complete all of the following sections, or the ghost of Joseph Pulitzer will spookily dance around your issue! A completed version of this template can be found at https://github.com/jsoma/data-studio-projects/issues/1

Pitch

The linguistic diversity of Russia always fascinated me, as there are hundrends of languages used in this vast country. Especially, my intention is to research small and endagered languages. For that, I used data from the [2010 Census of the Russian Federation.][http://www.gks.ru/free_doc/new_site/perepis2010/croc/perepis_itogi1612.htm]

Summary

My original file was in xlsx format, so I saved it first to csv. As it was quite dirty, I dropped some initial rows, a useless column while also I renamed my columns from Russian to English. I ended up with two columns, containing languages and the number of people speaking them. I converted the second column to an integer after some more cleaning of some empty rows, and finally I could navigate all these languages... By filtering my dataframe, I found out languages that are spoken, according to the census data, by 10 people of fewer, which I personally find amazing. Yeniseian languages, despite appearing as exctint and formerly spoken by the Yugh people, one of the southern groups along the Yenisei River in central Siberia, are revealed as still existing. Central Siberian Yupik, an endangered Yupik language spoken by five people of the indigenous Siberian Yupik people along the coast of the Chukchi Peninsula in the Russian Far East and in the villages of Savoonga and Gambell in St. Lawrence Island, still lives. Also, Sireniki Yupik, despite being labelled by Wikipedia as 'an extinct Eskimo–Aleut language' spoken 'in and around the village of Sireniki in Chukotka Peninsula, Chukotka Autonomous Okrug, Russia', the data reveal to us 5 remaining speakers... The same case with the Kerek language of Kamtchatka. Made some graphs. This would be an excellent place to embark on a fact-finding mission about the people using these languages. Profiling both them and their communities, in the milieu of the latest FT features on how pipeline contruction affects Siberian communities and also of the Siberian life depiction by Werner Herzog.

Details

Possible headline(s):

Data set(s): [http://www.gks.ru/free_doc/new_site/perepis2010/croc/Documents/Vol4/pub-04-05.xlsx] Code repository:

Possible problems/fears/questions:

Work so far

Played with my data

2018-07-11 12_26_41-project_1_draft-ilias_stathatos 2018-07-11 12_26_08-project_1_draft-ilias_stathatos

Checklist

This checklist must be completed before you submit your draft.

malbasi commented 6 years ago

I love the general idea of looking at dying languages and using that as a jumping off point for deeper journalism.

I'd like to see some context to the data. How many languages are spoken in Russia total and how many of those are dying? How has this changed over time?

vpenney commented 6 years ago

I like this project--it's an interesting question, and something that I've never thought about before. I also like that you look at languages spoken by ten or fewer people, then step back to look at languages spoken by 200 or fewer people.

If possible, I would really like to see a map of where these languages are spoken. Are these languages mostly spoken in remote areas? Overall, nice work!

nickospi commented 6 years ago
castorsia commented 6 years ago

Update_0

Your project content: images/words/etc

Thank you all for your precious feedback. I tried incorporating some of your suggestions regarding the overall context of the endangered languages, and their geographical distribution. In order to do that, I used additional data from http://www.endangeredlanguages.com/, an initiative of the University of Hawaiʻi at Mānoa and Eastern Michigan University, with funding provided by the U.S. National Science Foundation and the Luce Foundation. This database is very dirty BUT has the incomparable feature of actually having longitudinal data on where the languages are spoken.

I cleaned my database a bit as the columns names were all wrong, and filtered by country. Russia has 79 endangered languages (answering in a very elementary level the issue raised about the need for overall context).

Then, I manually searched for the 5 least spoken languages and made a new dataframe out of them, featuring additional info and their longitude and longitude.

Any changes in direction or topic?

No.

Problems/Questions

Checklist

kidaemon commented 6 years ago

If you use maps, it would be interesting to plot the percentage of population speaking each language in each area.

SiruiZhu commented 6 years ago

This topic is interesting! It makes me want to see a map of where is those languages speaking. Maybe also combine with the population data in that city or area.

castorsia commented 6 years ago

-------//-------CLASS FEEDBACK------//--------

-NUMBERS INSIDE THE BARS

castorsia commented 6 years ago

Update

-Cleaner code

Your project content: images/words/etc

update_2

update_2_1

The linguistic diversity of Russia always fascinated me, as there are hundrends of languages used in this vast country. Especially, my intention is to research small and endagered languages. For that, I used data from the 2010 Census of the Russian Federation. http://www.gks.ru/free_doc/new_site/perepis2010/croc/perepis_itogi1612.htm . My original file was in xlsx format, so I saved it first to csv. As it was quite dirty, I dropped some initial rows, a useless column while also I renamed my columns from Russian to English. I ended up with two columns, containing languages and the number of people speaking them. I converted the second column to an integer after some more cleaning of some empty rows, and finally I could navigate all these languages...

By filtering my dataframe, I found out languages that are spoken, according to the census data, by 10 people of fewer, which I personally find amazing. Yeniseian languages, despite appearing as exctint and formerly spoken by the Yugh people, one of the southern groups along the Yenisei River in central Siberia, are revealed as still existing. Central Siberian Yupik, an endangered Yupik language spoken by five people of the indigenous Siberian Yupik people along the coast of the Chukchi Peninsula in the Russian Far East and in the villages of Savoonga and Gambell in St. Lawrence Island, still lives. Also, Sireniki Yupik, despite being labelled by Wikipedia as 'an extinct Eskimo–Aleut language' spoken 'in and around the village of Sireniki in Chukotka Peninsula, Chukotka Autonomous Okrug, Russia', the data reveal to us 5 remaining speakers... The same case with the Kerek language of Kamtchatka. Made some graphs. This would be an excellent place to embark on a fact-finding mission about the people using these languages. Profiling both them and their communities, in the milieu of the latest FT features on how pipeline contruction affects Siberian communities and also of the Siberian life depiction by Werner Herzog.

Fun things to have: -A comparison of how these languages evolved from the 2002 census. Tried getting the data and joining them, but they are only in an online format, so could not make it. For now! Language graphs of Dagestan, a Russian region in the Caucasus with crazy language diversity. But the csv was all messed up! (and not sure how to graph these, anyway)

Any changes in direction or topic?

Problems/Questions

It may be useful to use the coordinates of where the top-5 rarer languages are used.

github/code: (https://github.com/castorsia/data-studio/commit/f45f1670a9ec58f9ab52d9794737afaefc97fdb0)

Checklist

playfairbot commented 6 years ago

Hello! I'm a little robot, let's see what's been going on here.

You need some feedback, let me summon @mattrehbein, @zle2105, @linleysanders for you

castorsia commented 6 years ago

Final

Project visuals/text

Where less than 10 people speak your language: Welcome to Siberia

The linguistic diversity of Russia always fascinated me, as there are hundrends of languages used in this vast country. My intention was to research small and endangered languages. For that, I used data from the 2010 Census of the Russian Federation.

By filtering my dataframe, I found out languages that are spoken, according to the census data, by 10 people of fewer, and some times, by one or two families, only.

Yeniseian languages, despite appearing as exctint and formerly spoken by the Yugh people, one of the southern groups along the Yenisei River in central Siberia, are revealed as still existing. Central Siberian Yupik, an endangered Yupik language spoken by five people of the indigenous Siberian Yupik people along the coast of the Chukchi Peninsula in the Russian Far East and in the villages of Savoonga and Gambell in St. Lawrence Island, still lives. Also, Sireniki Yupik, despite being labelled by Wikipedia as 'an extinct Eskimo–Aleut language' spoken 'in and around the village of Sireniki in Chukotka Peninsula, Chukotka Autonomous Okrug, Russia', the data reveal to us 5 remaining speakers... Same case with the Kerek language.

Let's look at the graphs.

2018-07-21 14_22_39-project_1_draft-ilias_stathatos

-Kerek language, used by 10 people of the Chukchi tribe in the Kamchatka peninsula: (https://en.wikipedia.org/wiki/Kerek_language)

-Yug language, part of the Paleo-Siberian languages, used by 1 person: The Yeniseian group is spoken in the Turukhansk region along the Yenisey River. Its only living members are Ket (formerly called Yenisey-Ostyak), which is spoken by about 500 persons, and Yug, with no more than 5 speakers. Kott (Kot; also called Assan or Asan), Arin, and Pumpokol, now extinct members of this group, were spoken chiefly to the south of the present-day locus of Ket and Yug. source: (https://www.britannica.com/topic/Paleo-Siberian-languages#ref604572)

-Old Sirenik, used by 5 people: Sirenik Yupik, Sireniki Yupik (also Old Sirenik or Vuteen), Sirenik, or Sirenikskiy is an extinct Eskimo–Aleut language. It was spoken in and around the village of Sireniki (Сиреники) in Chukotka Peninsula, Chukotka Autonomous Okrug, Russia. source: (https://en.wikipedia.org/wiki/Sirenik_Eskimo_language)

-Oroch language, used by 8 people: The Oroch language is spoken by the Oroch people in Siberia. It is a member of the northern group of the Tungusic languages and is closely related to the Nanai language and Udege language. It is spoken in the Khabarovsk Krai. source: (https://en.wikipedia.org/wiki/Oroch_language)

### Let's see where these are used, and by how many people!

languages_map_with_ppl

The big picture

2018-07-21 14_21_44-project_1_draft-ilias_stathatos

Overall, Russia features 79 endangered languages...

Details

Headline: Languages used by less that 10 people: Welcome to Siberia.

Published website version:

Code repository: (https://github.com/castorsia/data-studio/commit/c1545f75b66f1a50d87482d0c2c09309c71ce4f7)

Final data set(s): 2010 census of the Russian Federation (http://www.gks.ru/free_doc/new_site/perepis2010/croc/Documents/Vol4/pub-04-05.xlsx) and The Endangered Languages Project (http://www.endangeredlanguages.com/userquery/download/) .

What did you find to be the most difficult part of this project?

-Cleaning the databases and merging them.

Are you satisfied with what you produced? Is there anything you would like to change or improve?

So and so. I really liked my topic and that I actually found data that would make an interesting story, or series. But that was exactly the problem. The story. A story like this is in the milieu of traditional journalism, profiling people and communities. Not really up for graphing. The graphs serve as the gate to the longread... But here, there's no longread. Just some graphs that are in need of many improvements.

Checklist

mattrehbein commented 6 years ago

Nice job with a really cool topic. For future projects with similar graphs, only I'd say is you could give the numbers inside your bars a little more breathing room by moving them to the left a bit, and because you have the exact values in the bars, I think you could do without the x axis label numbers. And for the bigger chart at the end I think the opposite makes more sense: There are too many languages listed for the exact values to make sense, so I'd get rid of them and just rely on the number scale in the x axis.

Great work!

zle2105 commented 6 years ago

Great job! This topic and these charts are great. I like the map a lot and your use of symbols to represent the numbers. Next time, I would maybe move the white boxes off the map to reduce cluttering and make it so you dont have those big white boxes breaking up the image. I think the bar graph is perfect except that it is kind of difficult to see the numbers in white.

sarahslo commented 6 years ago

you could have made these charts out of little people figures. this is really about places where language is disappearing.

i think-if you are writing this much to explain what you are doing you have not found a truly compelling visual. it's hard to make charts out of things that are 'very few'. most of the map, for instance, is map.

i am curious about a place where there is only 1 person who speaks a language. i mean, who do they speak to? this language is almost gone. but there is not enough data here in my mind to make interesting charts. i can get the point without seeing data visualized.

tsp2123 commented 6 years ago

Hey I really liked your project and thought the topic was very interesting. The most difficult thing for me is your choice to use Cyrillic in your bar graphs, but I guess this is a matter of who you are aiming this project towards but the explanations are worth reading as the subject matter is interesting...but yeah, the graphs are hard to follow with the explanation because I cant exactly tell which language is what.