Closed venkatrajam closed 3 years ago
-----Work in progress-----
Original Article by The Hindu
Facts Only three Peace laureates were outside western Europe and North America until 1950 Gandhi nominated on 5 occasions, but never awarded Most Peace Nobels in the first part of the 20th century (until 1960) was secured by those from Western Europe and the USA The first non-American and non-western European to win the prize was in 1936, 35 years after the prize was instituted The first African to win the prize was in 1960 The first Asian won the Peace Prize in 1973, 72 years after the first award was given.
Claims Fact 3 may have led to this omission Nominees outside Europe and USA weren’t given awards
Problems with the visualization:
Thoughts:
Thoughts:
Thoughts:
Thoughts:
Good effort Abhijit. Annotation: The original viz uses annotations to highlight some salient points (first winner outside western Europe/Americas, first African, first Asian etc.) which you could retain in your viz too. Also it would be useful to add the numbers next to the bars -- while we know the relative quantities, we do not how much there quantities are. Visual detection: You can add the y-axis scale on the right side too, and add some delineating space between, say every 5 years or every decade so it is easy to track. Other than line space, you may also use other visual delineations such as lines, background colour etc.
-WIP-
How many GI's Does your state have? Article GI Documentation Data
The article is an educational article with intent of communicating all GI's in the country. The visualization in article tries to normalize GI's for each state against the area. It explains the concept of GI - a sign used on products that have a specific geographical origin and possess qualities or a reputation that are due to that origin. These products are split into 5 categories -
Original Visualization by The Hindu
Issues with existing visualization - Visualization 1
Visualization 2 & 3
Scope for redesign All GI's need to be represented geographically to retain the link of geographic indication. Representing it in charts reduces the geographical significance.
Attempt 1
I started with trying to plot all GI's on India map in a symbol chart to see if any natural clusters emerged. I also added color and number of GI's as dimension to help users identify clusters. However, post collecting data, GI's seemed to be spread sufficiently across country with exception of some states. Context of types of GI's was also lost.
Additionally, color code on donut chart implies an order where there is none. Sorting by population is not evident for readers in the first time.
Attempt 2
To Do -
click here to access article What is the story the author is trying to tell? In this article, the Author talks about India's internet speed and where it stands globally. The author has used the following table (Figure 1) to show where India stands among the BRICS nations and other countries which have the Fastest and slowest internet speeds. In the second part of the article, the author talks about the average download speed across circles by major operators in the last six month period, which can be visualized through the following bar graph (Figure 2) and table (Figure 3).
Class Discussion:
Redesign:
Instead of providing a list of all the countries with respective internet speeds which might overload the viewers with extra information, a color coded world map by labeling important/required countries which gives the viewer a basic understanding of Internet speeds across the world could be more efficient.
Author has emphasized more on the 11 countries which are mentioned in the table ( top 3, BRICS nations and last 3). Instead, representing them on a bar graph, arranged in Rank wise, would be easier for the viewer to compare the internet speed between the countries just by looking at the height of the respective bars.
I though, the percentage of the population who subscribed to mobile internet in a specific country could have an impact on the internet speed in that respective country. The relationship between the internet speed and the number of mobile internet subscribers could be an interesting visualization which might give more insights. But I couldn’t find any significant relation between these two attributes. As we measure internet speed in bandwidth, I wanted to make the visualization look like a band. That is the reason why I have chosen stream graphs to visualize how the number of internet users changed over time. Since it started from 0 and gradually increased, the visualization may not look like a band as I expected it to be.
The bar graph represents the internet speeds of 4 major operators in India along with their position globally.
In the article, the performance of various operators in various states of india was represented in a tabular format. But, the small multiples can be used to visualise how various operators are preferforming across India in a better way. But if the values are populated on the map, it might be overloaded for the viewer, which made me consider Sunburst graph to represent the quantitative data.
In the Sunburst graph, I have color coded the 4 different operators, the state gets the same color of the operator who provides the fastest internet. The speed of the internet is also color coded, darkest color being the fastest and the lightest color being the slowest.
The article that I picked up was Hunting in pairs: a look at the best bowling partnerships in Test cricket.
The article was written on the occasion of English pace bowler Stuart Broad becoming the seventh bowler and second Englishman, after pacer James Anderson, to pick up 500 Test wickets.
The article discusses the the best performers in 3 areas.
According to the article:
From the data that I had, and the calculated fields, it was possible to extract the following columns / attributes.
In my first iteration, I decided to pick the bowling partnerships that were mentioned in the article. For me, they represented the "Best performers" in various categories. I also included the best performing partnership of Muttiah Muralitharan, for the reason that he had been the highest wicket taker in test matches itself. I decided to keep them all in one chart and see where they lie with their stats.
I started with placing their information in a table.
Next, I highlighted the boxes that displayed the main reason for the pair being in the table itself. (The reasons / categories of their best performance.)
In my second iteration, I took the players mentioned above and tried visualising them on the bases of the wickets taken and their combined strike rate.
In the third iteration, I added the player country to the labels, and also included the wickets taken per game as another attribute.
From the previous iterations, I understood it is certainly difficult to represent the data for sports persons, in this case, cricket. The graphic that I created does display the information, but does it essentially allow the reader to understand it clearly?
From the feedback given to me, I realised that instead of showing the information about the pairs highlighted in the article, I could just take the data available for the pairs that excelled in one particular way. For example, the pairs that took 500 wickets or more. Also, instead of visualising the information via a graphic, I could highlight a few things in the table itself (as I had attempted to do in iteration 1).
The data I had for the pairs with 500 wickets and more:
The table after selecting important attributes and rounding off values:
I highlighted the highest and lowest values from the table.
After a few more adjustments, I looked for insights from table and came up with this infographic.
-WIP- Five states including Tamil Nadu recorded over 100 custodial deaths but zero police convictions between 2001-18 Link to the article
The Story The article was written shortly after the death of a father-son duo from Tamil Nadu, allegedly due to custodial violence. The incident sparked anger across the country, with even celebrities and politicians demanding justice.
The article takes a look at the data between 2001 and 2018, on the custodial deaths and the number of policemen convicted in those cases. While calls for a fair probe are growing, differences in these numbers are alarming. Most of these deaths were attributed to reasons other than custodial torture, such as suicide and death in hospitals during treatment. The article puts its focus on five states including Tamil Nadu, where the father and son died while in custody. The data shows that there were no police convictions between 2001 and 2018 in these states.
The following are the visualizations that came with the article.
The idea of creating an impact by showing the alarming difference between the numbers was somehow lost in these visuals. The harmless circular form could have been replaced with a sharp angular graph.
Data Data from 2001 to 2018, from all the states in India, includes the following:
Approach I was more interested in showing the difference between the number of custodial deaths, cases registered, policemen charge-sheeted and policemen convicted, without showing the actual numbers on the visualization. The numbers can be highlighted in the article, and the reader would be able to make the connection.
Rough sketches
The important feedback I received was that having different colors make it look like a stacked graph and using the same color with some transparency would make it look like they are overlapping.
Another idea was to show the state-wise number of custodial deaths and the number of policemen convicted over the years, on a scatter plot. The size of the circle represents the numbers.
I was only able to find the data between 2001 and 2012 for this and decided to focus on the first concept.
Refined Concepts
I made an attempt to create a visualization based on the second concept, with the available data. It turns out, this concept works better when it is interactive than being static since there are so many overlapping layers. It gives the impact of the story, however, doesn't convey the data clearly.
The Narrative On May 31, 2019, the center released a draft of the National Education Policy which included a controversial clause. The clause mandated the teaching of Hindi in schools across non-Hindi speaking states. The draft drew sharp criticism from different political circles in many non-Hindi speaking states, especially Tamilnadu. Soon after, the government issued a modified draft which left out the controversial clause. In this article, the author uses visualizations to explore the usage of the Hindi language through all the states in India.
The Data used The author has used 2011 language census data pertaining to:
*all Hindi speakers - the assumption in the census is that 'All Hindi speakers' can be calculated by summing up the people who use Hindi as their 1st, 2nd and 3rd language. 4th language and beyond are not included.
The data for the first three points can be extracted from here The dataset that I prepared can be found here
Comments on the data used Initially, it seemed that the narrative was about how non-Hindi speaking states felt outraged at being forced to teach Hindi. For those reasons, the data I wanted to look at was each state's highest performing language against the performance of Hindi in that state. But later, I realized that the narrative was about analyzing Hindi as a common nationwide language, and considering other alternatives for the same. In that case, the data used in the article seemed fitting.
Comments on the Dataset
Visualizations
1. Map titled 'Statewise Split' The 2011 census found that 43% of India’s population speaks Hindi. It is the highest spoken language in India. This map shows that the number, though large, is concentrated in a few of the central states.
The map below is a Choropleth that shows the percentage of the population in each state that speaks Hindi. Encoding: A gradient using 10 bins of color to represent the population. (Each subsequent bin stands for a 10% increase)
Encoding problem 1: Initially, I thought that 10 bins were unnecessary because people would not care for that amount of detail in the data. I thought it better to just use 'low, medium, and high' as the bins of 33% each as shown below.
However, there were 2 issues with reducing the bins:
I tried 4 and 5 bins respectively, but they each lead to similarly misleading clubbings. So I decided that 10 bins were best.
Encoding problem 2: The Hindu Choropleth map mainly shows the 'area' of the state. However, in this visualization, it wasn't the area that mattered but the population of each state. I also tangentially, tried to make a map that disregards both, area and population and gives all states an 'equal' status. Trial 1 shown below:
However, this did not really look like the shape of India, and the class feedback was that it did not make sense to even show this information geographically. So I proceeded with making a tiled Dorling Cartogram, using hexagons. Each hexagon = 1 million people. The point was to show the state population by number of hexagons, and the percentage of Hindi speakers in each state using 10 bins. (The Gujarat miscalculation has been corrected)
(I'm aware that the labels are not well placed or legible, I am working on fixing that)
2. Scatter Plot titled 'Native vs Non-native speakers' The chart below plots the percentage of native Hindi speakers against All Hindi speakers. Since 'All Hindi speakers' includes native speakers, the title used is misleading. Note: The labeling of both axes is wrongly switched.
Encoding: The position of each circle represents the percentage of 'native' and 'all' Hindi speakers respectively.
Encoding Problem 1 The visualization should highlight the percentage of Native speakers in each state, but it should also allow a comparison between states. But due to the 'position' encoding, the focus is more on the clustering of the states. Because the state names are not shown upfront, the comparison is difficult.
Encoding Problem 2 Because of the labeling of both Axes, i.e., both deal with 'Hindi speakers' with '1st choice' and '3rd choice', and there is no visual logic to remember which is which, the viewer takes on a large cognitive load to remember how to read the chart. Eg: A. Downward = lower total speakers B. Leftward = lower native speakers C. Up and left = Higher total speakers and among them lower native speakers D. Down and right = Lower total speakers but high native speakers among them This encoding, though passable, puts a lot of cognitive load on the user and volunteers very little information.
My attempt to improve it:
3. Scatter plot titled 'An alternative means of communication' The chart plots the percentage of total Hindi speakers vs total English speakers in each state. (Point to Remember: Total speakers = speakers with the language in their 'top 3 choices') Note: The labeling of both axes has been wrongly switched.
Encoding is the same as the previous scatter plot.
Problem 1 There is a difference in the range and granularity of the scales of the X-axis and Y-axis. This is fine, however, there is no visual difference/markers between them and at first glance, this difference is not noted. Since 'position' is the main encoding here, the viewer will subconsciously forget to take into account that the Y-axis only goes up to 45, and read both distances equally.
Problem 2 The point of the data is to show both, the difference in the ratio of Hindi to English within a state and also between states. The visualization only shows the distance between states, and due to the somewhat even distribution of the states, not many insights can be gained.
My attempt at improving this:
Tools used: Tableau, Figma and Tilegrams - a good open-source tool for tiled maps. For details on how to make a tilegrams map compatible with Tableau, read this.
FINAL OUTPUT
For the redesign, I used the same title and mostly the same text from the Hindu Article. I may have edited the text slightly. Please click and zoom to see the text clearly.
The article that I chose was: Where does India stand in the Global Gender Gap Index?
In the graphs that they used to represent India in the context of the world, I saw some major flaws.
My first few ideas:
Implementation: Alright... This didn't go as well as I thought!
Next iteration: Created a dot chart and showed how the country's score has changed since 2018.
Also created a choropleth to show differences in countries visually.
Representation:
For the final outcome, I would be choosing the following layout:
I've created a wordplay on "Dilli Door Hai" which signifies that India still has a long way to go. I have also kept a mostly 'generic' feminine colour palette. "Dilli Door Hai" is a common phrase used to signify that there is still a long way to go. Apparently, one of the Mughal emperors used it while traveling to Delhi.
Final Outcome: What all have I included?
Please zoom in to read through the details :-)
The Hindu article chosen is available here.
What is the story the author is trying to tell? Gender disparity in early education
Students in private schools performed better in various tasks than those enrolled in government schools and anganwadis, according to the Annual Status of Education Report (Rural) 2019 - This should be backed by the statement in the ASER referring to the method of instruction and not the data of the final results (as by that method of reasoning one could also claim the results are biased because of the gender ratio and the differing capabilities of the genders)
The grouping of age groups and usage of disorganized bar charts to represent part of a whole.
Initial Ideation to effectively combine these:
Possibility of sing a Spider chart to visualise better -
Explored Spider charts using different scales and parameters -
Decided to go ahead with a scale from 70 to 100 percentage for the completion of a particular activity. Chose six activities, two from each cognitive, language and numerical skills
Final Visualization
Spider Chart for the difference in performance ie. boys performing better than girls, pie chart for the reason of that difference ie. differing percentage of private and govt school education for girls and boys and line graph for the reason for differing admission to private and govt schools ie. education of mother
Comments -
Could create 3 spider charts to show in greater detail the performance in all 3 subjects of cognitive, language and numerical skills. Data is available in ASER
The original article can be found here
What story is the article trying to tell? The story talks about the successes and failures of those who have attempted to summit Mount Everest since 1953(figure 1). The problem with this visualization is that it uses a double y-axis which makes it hard to read the graph. You can see the dip in no. of attempts in 2014-2015 because of the avalanche and incidentally the year with the highest number of deaths. figure 1 (Source:Hindu)
The next part also shows a visualization of the main causes of death using a tree diagram(figure 2.) but does not give any valuable information. figure 2(Source:Hindu)
The article also shows the data in terms of countries(figure 3) and also between genders (figure 4).
In figure 3 (Source:Hindu) the rate of success is plotted and with the number of failures and successes. Nepal is obviously leading because of its proximity to Everest. Russia with very few attempts has a very high success rate. This chart was one of the better ones out of all of them.
First Attempt: I plotted the attempts of male and female in one graph to show the difference in the numbers. Feedback:The feedback was that using different encoding for both male and female and different colors for the people who summit was adding extra cognitive load on the reader.
2nd Attempt
I made another visualization of the causes of death on the mountain. The y-axis is the height at which death occurred, the x-axis is across time and the size of the shapes show whether they summited or not. Similar articles have mentioned that people have been more successful through the years but the death rate has not changed much hovering under 1%. We have better equipment now to climb the mountain and also to predict the weather. So why is the death rate not decreasing even more? As I analyzed the causes of death I decided to bin a few categories and divided them into Internal and External. "Internal" being caused to sickness and illness and "external" caused by harsh weather conditions, falls, and avalanches. There were some categories like unknown or disappearance for which I used pink and black respectively. The visualization shows that more recently people have been dying due to internal factors rather than external factors. This could be due to the fact that climbing Mt. Everest has become more of a tourist attraction where anyone can pay to climb the mountain without much proper training. The big circles show the people who died caused by an avalanche in 2014. Viz 2
Feedback: I had to do the binning within the legend also, There was some difficulty in recognizing whether shapes overlapped or not. Maybe use two different charts for summit and non-summit.
Visualization II 2nd Attempt
Single Visualization
I tried using different charts for those who summited and not summited but the focus of the story changed. I wanted to highlight the type of death on the mountain. I used 4 bins, 2 that are similar, Fall(Darker blue) could be caused by faulty technique and equipment but avalanches(Lighter Blue) were unpredictable. I combined disappearance with unknown and other as black.
Final Design
Tools Used: Tableau & Adobe Illustrator Export your Tableau worksheet as a PDF and then you can edit it in Illustrator as an SVG or EPS.
Original Article by Varun Krishnan, here.
Indian Innovation Index examines innovation capabilities and performance of the Indian States and UT's. It is measured as an average of Enablers( innovation inputs) and Performance( innovation outputs).
The writer tries to bring a bigger picture first. He talks about the global innovation index and its comparison with other emerging nations. And then a more focused comparison is made among the states of India based on enablers( innovation inputs) and performance( innovation output) providing grounds to state ranks in the Indian innovation index.
As per the article points 1 and 2 above forms the secondary information and 3 and 4 together form the primary information.
Source for Global Innovation Index is here and the source for Indian Innovation Index is here. The required data were extracted and cleaned for the purpose of use in this project.
Types of data-
The data visualizations of the article are shown in Figures 1, 2, and 3. The identified problems are listed below.
Figure 1
Figure 2
Figure 3
Ideations:
Original article: 77 cases filed in the 1950s still pending in courts across India. Link
Story of article: The Hindu article begins with the mention of conviction of a case filed 35 years ago. The article also talks about the pending cases since the 1950s with a duration of 10 years in a table.
The article also highlights how much the pending cases have increased significantly since 2010. "Out of the nearly 3 crore cases pending, 2.6 crore were filed after 2010" the article mentioned
At last, the Pending cases in Uttar Pradesh which seem to be significantly high as compared to other states. "Nearly one in every four pending cases across the country are from Uttar Pradesh (73.1 lakh)", the article mentioned.
*Only available data in the visualized form was the following table
Focus of my visualization was
10 states with the largest Number of pending case
Traveling back to the pending cases - View link for Interactive prototype https://public.flourish.studio/visualisation/3912141/
Highlighting the oldest pending cases till date
The size of the bubble here represents the total number of pending cases in that particular state till date.
Increase in pending cases since 2010
Tool used : Flourish and Tableau Desktop Data Source:National Judicial Data Grid
Link to the article here.
the data presented in the Hindu article in tabular form & as percentages, Colour (saturation) used to indicate the value of the percentages.
Intial redesign idea, as a stacked bar graph
The narrative: The authors of this article discuss the statistics on domestic violence in India. They talk about how during the first phases of the COVID-19 related lockdown, Indian women have filed more domestic violence-related complaints than recorded in a similar period in the last 10 years. They also bring to attention that even this spike might just be the tip of the iceberg since 86% of the women who experience domestic violence in India don’t seek help. They focus on the alarming rise in the number of complaints and the state-wise numbers. They also discuss that even these numbers do not make sense since most women who suffer from domestic violence do not seek help. They also bring out a haunting statistic, that even among the women who sought help, only 7% of them actually reached that authority, the majority of the women talked to their families.
Since the data for the lockdown domestic violence is not available, my focus was on the last two sections:
Buried in silence About 86% of women who experienced violence never sought help, and 77% of the victims did not even mention the incident(s) to anyone. The table shows that women who were subjected to both physical and sexual violence seek help relatively more than those who suffer from only one form of abuse.
Under-reporting Among the 14.3% of victims who sought help, only 7% reached out to relevant authorities — the police, doctors, lawyers, or social service organizations. But more than 90% of the victims sought help only from their immediate family.
Problems:
Interventions: I have combined the two sections of the story mentioned above since they are attributes of the same dataset. Moreover, to give a better impact and idea to the users, combining these two would put things into perspective.
Ideation:
The Final Visualization:
----WIP----
Article
This story is about analyzing which districts have exceeded the limit of C-Section deliveries that was determined by WHO. Though C-Section deliveries reduce the rate of delivery mortality WHO insists that C-Section deliveries in a particular region should not exceed 15%. But the analysis in 2016 shows that the southern states of India have exceeded the limit by a large margin. Central, north, and northeast the WHo limit is not exceeded much but the percentage of C-Section in private hospitals is higher than C-Section in public ones.
Original Story: Data | How has the state of democracy in India changed since 2008?
Background: The Economist's Intelligence Unit has been publishing an annual Democracy Index (with the exception of 2009) in which it assigns 167 nations of the world a Democracy Index score which measures the the state of their democracy. The index is an aggregation of 5 parameters: Electoral Process and Plurality, Functioning of Government, Political Culture, Political Participation and Civil Liberties. The Hindu Data Point article above uses this data as the source to weave a narrative about the state of India's democracy since 2008.
The article attempts to illustrate the change in India's democracy since 2008, particularly highlighting India's decline on the index, especially in certain parameters such as Civil Liberties. The data story is told exclusively with the help of three tables and accompanying text, using color hues and saturation in the table to encode improvements or declines and countries better than or worse than India. For example, the fewer the number of countries doing worse than India, deeper the saturation of red to imply that does not bode well for us.
Table 1: India's Scores In 2019
Table 2: Change in Scores since 2014
Table 3: Comparison of Changes in 2008-2014 and 2014-2019
Comments Current Data Visualization and Story:
Comments on the Dataset
My initial ideas addressed the lack of representation of ordinal data of the countries ranks and the absence of anchor countries in the spectrum that would give readers an idea of what the Democracy Index Scores mean compared to good democracies and authoritarian regimes. The second visualization, for example, would allow people to compare India's progress over the years against the US giving them an idea of how well or poorly we have performed against a democracy similar to ours.
I did not go with the third visualization of rank changes since accommodating such a large visualization of changes in ranks of so many different countries would draw too much focus towards itself and take away from the primary focus on India's performance through the years.
Instead, I chose to focus on the first two ideas and also include India's trends along different parameters over the years (and not just a simple trend line visualization of their overall Index score.)
I compiled data firstly of the ranks of India and countries such as Norway, Germany, China, etc. in 2019. Then I also scoured the available indices through the years for the scores of India and the US across the different parameters for the second part of the visualization.
Reasons for comparison against the US:
Rank Data
Performance of India and USA through the Years
Categorization Data for the Visualization
I then generated graphs of India and USA's performance through the years with Datawrapper. These were exported as PNG and used as an underlay to trace over in Illustrator. Illustrator provided more control over highlights, annotations and font sizing for better readability in the final design of the graphs. One such example of a simple graph from Datawrapper is below.
In the first iteration, I represented India on a continuum of dots that represented the ordinal rank data of the countries and arranged the graphs of India and US's trends vertically below that. The problem with this visualization was that using circular dots to represent all 167 countries made legibility an issue and the vertical arrangement of graphs made the second component of the visualization difficult to read.
The final visualization fixes the issues pertaining to the exclusion of other countries that would have given readers anchors for the scoring out of 10 and contextualized the ranks within the different categories of regimes. Another major change was the inclusion of annotations, especially in changes of power between the INC and NDA.
The visualization is divided into two components:
Design Changes in First Component (Rank Data)
Design Changes in Second Component (Trend Comparison)
The visualization was an insightful one though I have tried to be a little more direct between the INC/NDA split, which can also be gleaned from the visualization which shows that we have done significantly poorly since 2014. While the original author may have chosen to remain apolitical by not highlighting this contrast, I have chosen not to.
Lastly, this visualization may be made even better as an interactive visualization in which individual points in the trend graphs could have popups of key events instead of annotations as they currently do. Similarly, the users could hover over the ranks and see which countries lie at the selected rank.
A higher resolution version of the visualization may be accessed here.
For this assignment, we'll use data stories from The Hindu Data Point.
Select a story that you like, study it carefully and redesign it. Specifically I want you to focus on understanding the data that powers the story, and how it is visually encoded to tell the intended story. Document your design process, capturing the following:
What is the story the author is trying to tell? What the data he/she is using to tell the story? Describe its details -- type of data, extent of the data, dimensions of the data, gaps in the data, what data is essential and what is irrelevant. How is it encoded, problems with it and how you attempted to improve it. You may choose to expand or curtail the scope of the data used in the story, or add an additional dataset to tell the story better. But do not deviate from the main intent of the original story. In other words, it is a redesign exercise, and hence I do not want you tell a different, unrelated story.
While you should provide a link to the original story, it might be useful to capture and display inline, appropriate parts of the original visualization, and your own design iterations to produce a coherent documentation.
For reference, take a look at what the previous batch did with this assignment.