Open castorsia opened 6 years ago
I love the general idea of looking at dying languages and using that as a jumping off point for deeper journalism.
I'd like to see some context to the data. How many languages are spoken in Russia total and how many of those are dying? How has this changed over time?
I like this project--it's an interesting question, and something that I've never thought about before. I also like that you look at languages spoken by ten or fewer people, then step back to look at languages spoken by 200 or fewer people.
If possible, I would really like to see a map of where these languages are spoken. Are these languages mostly spoken in remote areas? Overall, nice work!
Thank you all for your precious feedback. I tried incorporating some of your suggestions regarding the overall context of the endangered languages, and their geographical distribution. In order to do that, I used additional data from http://www.endangeredlanguages.com/, an initiative of the University of Hawaiʻi at Mānoa and Eastern Michigan University, with funding provided by the U.S. National Science Foundation and the Luce Foundation. This database is very dirty BUT has the incomparable feature of actually having longitudinal data on where the languages are spoken.
I cleaned my database a bit as the columns names were all wrong, and filtered by country. Russia has 79 endangered languages (answering in a very elementary level the issue raised about the need for overall context).
Then, I manually searched for the 5 least spoken languages and made a new dataframe out of them, featuring additional info and their longitude and longitude.
No.
If you use maps, it would be interesting to plot the percentage of population speaking each language in each area.
This topic is interesting! It makes me want to see a map of where is those languages speaking. Maybe also combine with the population data in that city or area.
-------//-------CLASS FEEDBACK------//--------
-NUMBERS INSIDE THE BARS
-Cleaner code
Yug language, part of the Paleo-Siberian languages, used by 1 person: The Yeniseian group is spoken in the Turukhansk region along the Yenisey River. Its only living members are Ket (formerly called Yenisey-Ostyak), which is spoken by about 500 persons, and Yug, with no more than 5 speakers. Kott (Kot; also called Assan or Asan), Arin, and Pumpokol, now extinct members of this group, were spoken chiefly to the south of the present-day locus of Ket and Yug. source: (https://www.britannica.com/topic/Paleo-Siberian-languages#ref604572)
Old Sirenik, used by 5 people: Sirenik Yupik, Sireniki Yupik (also Old Sirenik or Vuteen), Sirenik, or Sirenikskiy is an extinct Eskimo–Aleut language. It was spoken in and around the village of Sireniki (Сиреники) in Chukotka Peninsula, Chukotka Autonomous Okrug, Russia. source: (https://en.wikipedia.org/wiki/Sirenik_Eskimo_language)
Oroch language, used by 8 people: The Oroch language is spoken by the Oroch people in Siberia. It is a member of the northern group of the Tungusic languages and is closely related to the Nanai language and Udege language. It is spoken in the Khabarovsk Krai. source: (https://en.wikipedia.org/wiki/Oroch_language)
The linguistic diversity of Russia always fascinated me, as there are hundrends of languages used in this vast country. Especially, my intention is to research small and endagered languages. For that, I used data from the 2010 Census of the Russian Federation. http://www.gks.ru/free_doc/new_site/perepis2010/croc/perepis_itogi1612.htm . My original file was in xlsx format, so I saved it first to csv. As it was quite dirty, I dropped some initial rows, a useless column while also I renamed my columns from Russian to English. I ended up with two columns, containing languages and the number of people speaking them. I converted the second column to an integer after some more cleaning of some empty rows, and finally I could navigate all these languages...
By filtering my dataframe, I found out languages that are spoken, according to the census data, by 10 people of fewer, which I personally find amazing. Yeniseian languages, despite appearing as exctint and formerly spoken by the Yugh people, one of the southern groups along the Yenisei River in central Siberia, are revealed as still existing. Central Siberian Yupik, an endangered Yupik language spoken by five people of the indigenous Siberian Yupik people along the coast of the Chukchi Peninsula in the Russian Far East and in the villages of Savoonga and Gambell in St. Lawrence Island, still lives. Also, Sireniki Yupik, despite being labelled by Wikipedia as 'an extinct Eskimo–Aleut language' spoken 'in and around the village of Sireniki in Chukotka Peninsula, Chukotka Autonomous Okrug, Russia', the data reveal to us 5 remaining speakers... The same case with the Kerek language of Kamtchatka. Made some graphs. This would be an excellent place to embark on a fact-finding mission about the people using these languages. Profiling both them and their communities, in the milieu of the latest FT features on how pipeline contruction affects Siberian communities and also of the Siberian life depiction by Werner Herzog.
Fun things to have: -A comparison of how these languages evolved from the 2002 census. Tried getting the data and joining them, but they are only in an online format, so could not make it. For now! Language graphs of Dagestan, a Russian region in the Caucasus with crazy language diversity. But the csv was all messed up! (and not sure how to graph these, anyway)
It may be useful to use the coordinates of where the top-5 rarer languages are used.
github/code: (https://github.com/castorsia/data-studio/commit/f45f1670a9ec58f9ab52d9794737afaefc97fdb0)
Hello! I'm a little robot, let's see what's been going on here.
You need some feedback, let me summon @mattrehbein, @zle2105, @linleysanders for you
The linguistic diversity of Russia always fascinated me, as there are hundrends of languages used in this vast country. My intention was to research small and endangered languages. For that, I used data from the 2010 Census of the Russian Federation.
By filtering my dataframe, I found out languages that are spoken, according to the census data, by 10 people of fewer, and some times, by one or two families, only.
Yeniseian languages, despite appearing as exctint and formerly spoken by the Yugh people, one of the southern groups along the Yenisei River in central Siberia, are revealed as still existing. Central Siberian Yupik, an endangered Yupik language spoken by five people of the indigenous Siberian Yupik people along the coast of the Chukchi Peninsula in the Russian Far East and in the villages of Savoonga and Gambell in St. Lawrence Island, still lives. Also, Sireniki Yupik, despite being labelled by Wikipedia as 'an extinct Eskimo–Aleut language' spoken 'in and around the village of Sireniki in Chukotka Peninsula, Chukotka Autonomous Okrug, Russia', the data reveal to us 5 remaining speakers... Same case with the Kerek language.
-Kerek language, used by 10 people of the Chukchi tribe in the Kamchatka peninsula: (https://en.wikipedia.org/wiki/Kerek_language)
-Yug language, part of the Paleo-Siberian languages, used by 1 person: The Yeniseian group is spoken in the Turukhansk region along the Yenisey River. Its only living members are Ket (formerly called Yenisey-Ostyak), which is spoken by about 500 persons, and Yug, with no more than 5 speakers. Kott (Kot; also called Assan or Asan), Arin, and Pumpokol, now extinct members of this group, were spoken chiefly to the south of the present-day locus of Ket and Yug. source: (https://www.britannica.com/topic/Paleo-Siberian-languages#ref604572)
-Old Sirenik, used by 5 people: Sirenik Yupik, Sireniki Yupik (also Old Sirenik or Vuteen), Sirenik, or Sirenikskiy is an extinct Eskimo–Aleut language. It was spoken in and around the village of Sireniki (Сиреники) in Chukotka Peninsula, Chukotka Autonomous Okrug, Russia. source: (https://en.wikipedia.org/wiki/Sirenik_Eskimo_language)
-Oroch language, used by 8 people: The Oroch language is spoken by the Oroch people in Siberia. It is a member of the northern group of the Tungusic languages and is closely related to the Nanai language and Udege language. It is spoken in the Khabarovsk Krai. source: (https://en.wikipedia.org/wiki/Oroch_language)
### Let's see where these are used, and by how many people!
Overall, Russia features 79 endangered languages...
Headline: Languages used by less that 10 people: Welcome to Siberia.
Published website version:
Code repository: (https://github.com/castorsia/data-studio/commit/c1545f75b66f1a50d87482d0c2c09309c71ce4f7)
Final data set(s): 2010 census of the Russian Federation (http://www.gks.ru/free_doc/new_site/perepis2010/croc/Documents/Vol4/pub-04-05.xlsx) and The Endangered Languages Project (http://www.endangeredlanguages.com/userquery/download/) .
-Cleaning the databases and merging them.
So and so. I really liked my topic and that I actually found data that would make an interesting story, or series. But that was exactly the problem. The story. A story like this is in the milieu of traditional journalism, profiling people and communities. Not really up for graphing. The graphs serve as the gate to the longread... But here, there's no longread. Just some graphs that are in need of many improvements.
Nice job with a really cool topic. For future projects with similar graphs, only I'd say is you could give the numbers inside your bars a little more breathing room by moving them to the left a bit, and because you have the exact values in the bars, I think you could do without the x axis label numbers. And for the bigger chart at the end I think the opposite makes more sense: There are too many languages listed for the exact values to make sense, so I'd get rid of them and just rely on the number scale in the x axis.
Great work!
Great job! This topic and these charts are great. I like the map a lot and your use of symbols to represent the numbers. Next time, I would maybe move the white boxes off the map to reduce cluttering and make it so you dont have those big white boxes breaking up the image. I think the bar graph is perfect except that it is kind of difficult to see the numbers in white.
you could have made these charts out of little people figures. this is really about places where language is disappearing.
i think-if you are writing this much to explain what you are doing you have not found a truly compelling visual. it's hard to make charts out of things that are 'very few'. most of the map, for instance, is map.
i am curious about a place where there is only 1 person who speaks a language. i mean, who do they speak to? this language is almost gone. but there is not enough data here in my mind to make interesting charts. i can get the point without seeing data visualized.
Hey I really liked your project and thought the topic was very interesting. The most difficult thing for me is your choice to use Cyrillic in your bar graphs, but I guess this is a matter of who you are aiming this project towards but the explanations are worth reading as the subject matter is interesting...but yeah, the graphs are hard to follow with the explanation because I cant exactly tell which language is what.
Please complete all of the following sections, or the ghost of Joseph Pulitzer will spookily dance around your issue! A completed version of this template can be found at https://github.com/jsoma/data-studio-projects/issues/1
Pitch
The linguistic diversity of Russia always fascinated me, as there are hundrends of languages used in this vast country. Especially, my intention is to research small and endagered languages. For that, I used data from the [2010 Census of the Russian Federation.][http://www.gks.ru/free_doc/new_site/perepis2010/croc/perepis_itogi1612.htm]
Summary
My original file was in xlsx format, so I saved it first to csv. As it was quite dirty, I dropped some initial rows, a useless column while also I renamed my columns from Russian to English. I ended up with two columns, containing languages and the number of people speaking them. I converted the second column to an integer after some more cleaning of some empty rows, and finally I could navigate all these languages... By filtering my dataframe, I found out languages that are spoken, according to the census data, by 10 people of fewer, which I personally find amazing. Yeniseian languages, despite appearing as exctint and formerly spoken by the Yugh people, one of the southern groups along the Yenisei River in central Siberia, are revealed as still existing. Central Siberian Yupik, an endangered Yupik language spoken by five people of the indigenous Siberian Yupik people along the coast of the Chukchi Peninsula in the Russian Far East and in the villages of Savoonga and Gambell in St. Lawrence Island, still lives. Also, Sireniki Yupik, despite being labelled by Wikipedia as 'an extinct Eskimo–Aleut language' spoken 'in and around the village of Sireniki in Chukotka Peninsula, Chukotka Autonomous Okrug, Russia', the data reveal to us 5 remaining speakers... The same case with the Kerek language of Kamtchatka. Made some graphs. This would be an excellent place to embark on a fact-finding mission about the people using these languages. Profiling both them and their communities, in the milieu of the latest FT features on how pipeline contruction affects Siberian communities and also of the Siberian life depiction by Werner Herzog.
Details
Possible headline(s):
Data set(s): [http://www.gks.ru/free_doc/new_site/perepis2010/croc/Documents/Vol4/pub-04-05.xlsx] Code repository:
Possible problems/fears/questions:
Work so far
Played with my data
Checklist
This checklist must be completed before you submit your draft.
[Project]
in the title