Closed venkatrajam closed 4 years ago
Last Edit: 15/09/19 - Incorporating responses to Venkat Sir's comments
For this re-design, I picked an article titled 'Roger Federer's career shows he's ageing like fine wine', written on the 8th of March, 2019 (right after he won his 100th ATP title). It was a visualization of Roger Federer's career performance in terms of titles won, when compared to the 15 men's tennis players with the most titles. The original article can be found here.
The article claimed that Federer has been performing better as his career has progressed, relative to the other players in the top 15. It did so by comparing a single metric - Titles won/Tournaments entered - for each player across three stages of their career. The information was presented through simple dot plots made in Flourish. The plot for Stage I (Years 1 to 7) looked like this -
The percentage of wins was encoded in triplicate - On the Y Axis, in the size of the dot, and the shade of the dot as well.
In terms of interactivity, readers could view the position of one player at a time using a drop down menu in the top left. By hovering on a dot, readers could see information about the player, number of wins, and win percentage.
Having followed tennis fairly religiously since I was 7, I felt that the approach towards player performance was overly simplistic at best, and at times very misleading.
Just to clear things, my mood in life is directly proportional to the extent of Federer's successes at any given point of time. That being said, I don't think he has gotten better over time. What's amazing about his career is how he has bounced back from multiple almost-career-ending spells, and is still performing consistently at the highest level of the game.
The biggest problem I had with the visualisation was how the longevity of a player, and consistency over time, was not being adequately expressed. In fact, in the third graph, I felt it went ahead and displayed the opposite insight, by making it seem that Djokovic and Nadal had reached the level of Federer, but with much greater pace (Image below). This is most clearly not the case.
And so I started off by collecting and compiling more detailed statistics for each of the 15 players in this visualisation. I managed to find the number of tournaments entered and titles won in each year of each of their careers, and put it all together in a spreadsheet that looked something like this -
Of the 15 players included in the original title, I excluded three - Rod Laver, Ilie Nastase and Guillermo Vilas. These three all played some portion of their careers before the Open Era, and a number of tournaments that they played in at the start of the Open Era are no longer recognised by the ATP, and I wasn't able to find accurate statistics for them.
Even a cursory glance at the image of the spreadsheet above depicts how it doesn't make sense to compare all players across the three stages of their career. Federer has been active in the tennis circuit for 22 years now. Only four players of the remaining 12 are active, and considering Andy Murray's recent debilitating history with injuries, it seems unlikely that his third act will produce many more titles. Among the players who have retired, only Jimmy Connors, John McEnroe and Andre Agassi have had career lengths comparable to Federer. Initial explorations with graphing all 12 players together led to some very noisy and unreadable graphs.
For the purpose of this redesign, I decided to pick -
I started by plotting the same metric used by the original article (titles won/tournaments entered) for each of these six players across the three stages of their careers. This is depicted below -
These graphs did make the distinction a little clearer. One can easily see how Federer's efficiency metric has been much higher than any other player consistently over stage 2 and stage 3. But this still felt inaccessible to people without a clear understanding about how the tennis season is structured. I then tried just representing wins and losses at tournaments across Federer's career through a stacked bar graph -
This to me seemed like a nicer way of depicting things. It gives readers a visual sense of the total number of tournaments played at each step of a players career, and how many of those tournaments were converted into titles.
I then made a continuous graph (instead of bars, for purely aesthetic reasons) for each of the six players in consideration.
I also figured it would be nice to have player graphs correspond to certain known characteristics about them.
The initial iteration of the article that was discussed in class looked like this -
The career stage graphs on the right are all aligned to the start of each player's career, and divided by guide lines into the three stages considered by the original article. Textual information was included for the number of tournaments entered and titles won for each career stage. Another graph, depicting the cumulative titles won by each player over the course of their career was included as well.
There were a few issues with how the information was presented in this. It was unclear whether players were retired or currently active, as all graphs were brought to zero in the end. A number of the labels were not very readable too. These have been fixed, and the current version of the article looks like -
The newer version of the article is included below -
Thanks to Venkat Sir for pointing these mistakes out, and for the overall feedback as well.
On the whole, the order of the 6 players in the main article was determined as follows -
My interpretation of consistency was informed by the broader analysis of 12 Open Era players. The expanded version of the visualization is included below. The original 6 are ordered as is, and the next 6 are in the order of decreasing career lengths.
Now the drop-off in the case of the next 6 players is very clear when presented visually. They have all played far fewer tournaments in the third stage of their careers, and have won even lesser tournaments than those in the initial visualization.
Consistency in this case is a subjective, visually judged measure. I think it would be possible to quantify the same in a better way, will have to think about it. But if I had to split the measure, the three parts to it would be (obviously) -
For example, Thomas Muster played a staggering number of tournaments in his stage II (182), but really didn't win that high a percentage of them. Soon after, in stage III, his output and efficiency quickly decreased even further. Hence between stage 2 and 3, Muster was not quite consistent, nor efficient.
Now going back to the initial 6, Federer's 'consistency' comes from the fact that he has played much more tournaments in the third stage of his career than anyone with the exception of Jimmy Connors. However, Connors' win percentage was terribly low throughout his stage III, while Federer at this point in time has the highest win percentage among any player, even in the longlist. This combination of longevity and high win percentage makes his trajectory look more consistent over the 3 stages.
The same can be said for his contemporaries, Nadal and Djokovic, who have maintained similar win percentages in their third stages, and show no signs of dipping in the near future. The dominance of Federer, Nadal and Djokovic over the last two decades has been unprecedented, and on an individual level, their output has far exceeded any tennis player before them. While they haven't played for as long as Connors, they have managed to keep winning the biggest tournaments of the sport at a point where Connors was merely participating in his career. So in a way, the article could just as easily have been about the great performances of Nadal and Djokovic. Maybe some other author will write it for them. I respect what they have achieved in the sport, but this is me doing my bit and preaching the gospel of Federer.
That's all for now, Rishi
What is the story the author is trying to tell? India's Minister of Home Affairs, Amit Shah, claimed that section 370 of the Indian constitution "...ensures the healthcare in Jammu and Kashmir suffers, no doctor wants to go there. 370 ensures there is no right to education for the children of Kashmir." and "the entire country is developing, but when we look at Kashmir, we get tears in our eyes that even after 70 years (of independence), they are still living in poverty."
The articles takes a look at how the state of Jammu and Kashmir fares on various indicators of growth and development compared to the other states of India to see if there is merit to these claims. By looking at factors like people served per govt. doctor, life expectancy, poverty rate, etc, we can see that J&K was mostly doing better than average, and on some metrics like life expectancy, it is towards the top.
What data are they using to tell the story? The data used in the story is quantitative data about the performance of various states on different metrics. The data is sourced from different sources, and is for different years or year ranges, ranging from 2011-2018. Some data sources did not provide information for every state, and hence the number of states also varies from 22 to 30 - which is a little confusing as there are 29 states in India. The information presented in the article is in the form of charts which discard the names of all states except J&K, but the accompanying text also specifies which state is performing the best and the worst on the metric.
How is the data encoded? The data was originally encoded as 7 different charts. Each chart used points on a 1 dimensional scale, with J&K's dot colored red, and all other dots in orange. The charts marks how many states are better off and how many are worse for each metric.
What are the problems with the encoding?
How did I attempt to improve it?
Feedback from class discussion
Roger Federer - Aged Like Wine?
Good document! Some quick comments:
Original Article link click here
The story they are telling through article
In this article, the author tried to give a status report of the smart city mission. In June 2015 Indian government had launched 100 smart cities mission to provide better infrastructure, expand housing to all, and developed open spaces, good governance in cities. The article was written five years after launching this mission. They analysed the gap between the approved budget and money spent on smart city projects by various cities.
Major insights of this article
The data they are using to tell story
The data used to make this data story is mainly from smart city websites(maybe). So I tried looking for the data but could not find anything related to reports on smart city website. But I got another report from MoHUA stating figures and data related to the smart city link, page no 189
How is the data encoded and problems with encoding
They used a 3D doughnut pie chart to visualize data but did not mention the source of data as well as the legend to chart.
.. There is long text para to give insight about some important statistical figures about funds and budget, Which he also visualized in bar graphs without naming the graph
.
.. At last, he concluded his article with XY axis graphs, which shows a comparison between average number completed projects per city to the average cost of the completed projects per city. He used 3 parameters here:
Complexity in this visualization: Cost of the projects and number of completed projects are two independent variables, comparison of both in the form of a graph with additional third parameter (Number of smart cities in States) is difficult to understand for the common readers.
.. Redesigning
I started with reading and finding important information in this article. In this article, some of the important statistical figures about the smart city were just written in text format and need to be presented in a better way and highlight it.
-Identified information-
The above information can give insight and the number of projects initiated under the smart city mission.
The budgetary information needs to be visualized in a manner to give a comparison between the budget allocated and the budget spent on projects.
I separated information of "number of smart cities in the state" from the "average number completed projects per city" vs "average cost of completed project per city" for simplifying the visualization.
What is the status of Smart City projects in India?
Here is the original Hindu Article. Story The story aims to highlight that Moon lander missions are difficult than Moon orbiter mission.
Data In form of evidence to corroborate the story, the author gives the success rates of countries who have tried to succeed. Along with the missions a timeline is also given and specific countries are highlighted for their spectacular success (China), as well as their huge struggle (USSR).
Details Type of data: Bi-nominal (Successful/ Unsuccessful) with the rate of success in ratio scale which makes it a mix of qualitative and quantitative. The visualization falls between the categories Declarative and Data driven and hence qualifies as Visual confirmation. General knowledge about several countries' missions is given which makes the data set more interesting. Below is the visualization used to depict the story, followed by 3 charts that comprise the available data:
Gaps in the data: The rate of success for the different kinds of moon missions (Lander/ Rover/ Orbiter/ Sample return) is given in percentages and in the same chart even though the data is in small samples. This led to complication in calculating the exact number of successful and unsuccessful missions of different purposes which is why the story does not sell. The reader gets confused in the tables and percentage values when all s/he is looking for is the proof about how Moon lander missions are more difficult. The message is delivered through data but only text. The charts have confusing titles which increase cognitive load as the reader has to reverse calculate the figures. Visual Encoding: Red and green circles have been used to show unsuccessful and successful missions on a timeline. Problems: The story is fragmented and the viewer has to go back and forth to understand the message. The 2 charts with percentages are exhaustive and even after reading those percentages, there is grasp of number of successful/ unsuccessful missions for lander and orbiter missions. The timeline just indicates two time spans- 1965-1975 and then 2010- 2019 with missions mapped on it. However, there is no clear mark of which particular year the mission belonged to.
The Info-graphic Objective was to make the story comprehensible in a quick glance and visually appealing. I removed the charts and used two concentric Donut diagrams around the moon- each representing the success rates of Lander and Orbiter missions. The diagrams showed the success rate of individual countries (for lander missions) and overall rate. Colour coding is used for the donut diagrams. The ideas is that the viewer will be able to immediately see the difference in red-green portions and the message will be delivered. Successful mission is depicted as shades of green and Unsuccessful mission as red. I discarded the timeline and showed the country specific missions using variation of size (for the flag icons) and the border of the icons depicted whether successful or unsuccessful.
The feedback session helped me realize that the concentric circle representation is not the best way to compare lengths and hence values. I should have removed the percentage values and consistently used the number of missions in each category as the numbers were quite small in my data set.
Feedback Incorporation: I reverse calculated the Hindu Point data which gave me the number total successful and unsuccessful missions. I brought the two rings closer and added marks on them which made it easier to compare the rate of success and failure. Final Infographic:
Here is the original article.
The article's aim is to be show how big the Indian electorate is and compare it to some of the democracies of the world.
It starts of by showing the growth in the electorate since 1952.
They also mention the percentage of voter turn out over the years starting from 1962.
These visualisations needn't have been separate and could have been combined into one, letting the user explore correlations between the no. of voters and turn out. Also, we don't get a sense of how many voters are actually voting with these two separate visualisations.
There's also a section about the number of candidates contesting, with a line graph to represent the percentage of women candidates. While this was an interesting graph, with a lot of areas that could be improved, I felt it did not add to the story of showing how big India's electorate is, so I decided to omit it from the final visualisation.
The final visualisation of the article shows how the electorates of the various Indian states compare with other democracies of the World.
The visualisation gives an approximation of what countries would could best replace the Indian states if only the electorates were considered. So, for example, Uttar Pradesh has an electorate that is roughly the same size as that of Brazil. The problem with this is it doesn't tell anything to the reader. To start with, I have no idea about the size of Uttar Pradesh's electorate, and I am being told that this is the same size as the electorate of Brazil. Also, the comparisons are being made with countries that don't have any relation with the state in discussion. Jammu & Kashmir is compared with Madagascar, which doesn't quite make much sense. I tried to remove this notion by getting rid of a choropleth and instead have a Sankey. This ensured that the only comparison would be between the electorates of the states and countries without adding confusion.
In order to be able to make the Sankey, I needed to find the numbers that make up the electorate. For the Indian states, thankfully, Wikipedia had a neat documentation under their article on the Indian Elections. For finding the electorates of other countries, I had to look a bit before I found this wonderful website.
The ordering of the Sankey is arbitrary, as is the colour choices. This was the limitation of the software I was using at the time. I have also combined the growth in the size of the electorate and voter turn out into one visualisation to allow for comparison.
There were quite a few areas to improve with this visualisation. To start with, the percentage of voters line could have been more exaggerated, as done in the original visualisation to highlight the ups and downs in the voter turn out. The Sankey was too small and for the given space, there were other better ways of visualising, while retaining the choropleth characteristic used in the orginila visualisation. The Sankey was too arbitrary and could have done with some amount of logic, with respect to the ordering and use of colours. Also, the smaller states, such as Nagaland and Tripura are hardly visible.
So on to fixing this...
Now I really wanted to fix everything, really. But somethings require greater attention than initially thought. I decided to focus on the fixing the Sankey. I wanted to make the representation more relevant. While the Sankey showed how the electorates translated across to different countries, it was still flawed in terms of the arbitrariness of the flows. Also, there was no inherent order in which either the states or the countries appeared. And the other pressing point was the lack of relatability to some of the lesser known countries visualised.
Just as the Indian electorate's choices are represented in Lok Sabha, I decided to represent the relation of the states to the countries through their representation in the Lok Sabha. At the same time, I decided to fix the issue of having to look up countries such as Timor-Leste.
I changed the data set to 10 well known democracies that the average Indian reader would've come across. I took the size of the electorates of each of these countries, and normalised them to get the magical number of 543 seats.
I then visualised the Indian states' allotted constituencies on one side. Arranged them alphabetically, and split the legend either side of the chart depending on its proximity to the location of the state's dot. This might be slightly contentious, but 31 different colours would've been quite a task. Right below this, I placed the representation that the countries would have if they had to make up the Indian Lok Sabha.
So that I think would be my final visualisation of translating the Data Point article.
Cheers, Avyay.
The original article can be read on this link.
The article is trying to tell a story of the trend of mobile data usage in India in the past four years using data mined from the report of TRAI titled Wireless Data Services in India published recently. The author gives emphasis to show how much mobile data do Indian individual uses in a month and how these numbers have been changing from 2014 to 2018.
The author tries to articulate the story using three graphical visualizations. First, to show the change in the number of mobile data subscribers according to different connectivity technologies(CDMA, 2G, 3G, 4G) throughout the last five years. Second, to show the effect of change in cost on mobile data consumption per month per subscriber. And third, to show the number of data subscribers and average data consumption per month in different service areas.
For the first visualization, an area chart is used to show the number of mobile data subscribers for four types of connectivity technologies, i.e., CDMA, 2G, 3G & 4G. The area chart is inherently difficult to comprehend when the intent is to show the change in numbers and not just to show something is changing over time.
To show how the number of subscribers changed over time, I redesigned the visualization using the line chart, with the same dataset. The lines are encoded with different colors for different types of connectivity and legend is provided to refer the encoding. With these changes, now it is easy to understand the trend in the use of Mobile data of these connectivity technologies for each year from 2014 to 2018.
The second graph, which shows the trend in the amount of mobile data used with respect to the decrease in the cost of mobile data per GB, used the line graph on top of the Bar graph. Although the trend shown in this graph is easy to understand, the graph itself is not necessary here to understand this trend. It is obvious that decreasing rates result in an increase in mobile data consumption.
For the third visualization, the scatter plot is used to show the amount of data used per month vs. no. of subscribers in each service area. From the scatter plot, it is difficult to understand the two attributes (amount of data used per month and the total number of subscribers) of each service area.
I transformed the same scatter plot to a Bar graph with x-axis plotted with two bars (Red & Green) for each service area, Red for a number of data subscribers in Millions and Green for the amount of Mobile data consumed by individuals per month in GB. The two Y-axes show the value of two bars encoded with respective colors.
The First Submission
Feedback
Iteration
Final Submission
The original Hindu article can be found here As is the case with technical problems, they crop up at the most crucial moments.
Laptop crashed. Lost initial version, will update as soon as I reconstruct it.
What I do have, though, is a PDF version I'd used for printing, and the valuable insights gained from discussion in the class. The PDF is right here: The_Hindu_RedesignArtboard 1 copy 2.pdf The major feedback I received on this was that the 'violin' chart wasn't adding as much meaning to the overall information to be conveyed, and the larger graphic tried to cram a lot of information but didn't quite tell a story. I was also missing data points and there was no available reference to the source data, which was a limiting factor. There was poor contrast in some of the gridlines, especially apparent in print. After the course, though, there were even more factors I realized I had initially not taken into consideration - dealing mainly with detection, assembly and estimation as a part of pre-processing rather than as an afterthought.
Trying the assignment afresh might be a good idea
In any case, I would need to make the infographic again.
Will need to reconstruct from scratch to be able to submit a workfile.
Since I'm working from scratch here anyways, thought I could try out another article from Data Point which seemed pretty interesting from the data perspective, but alas for readers of The Hindu, not visualized particularly well. Here's the 'new' article, which looks at crude oil, the recent price spike (all time high in the last 19 years), and India's oil imports. The particular graph which had source data available deals with India's oil import and looks somewhat like this: My task would be to treat it as a data story for the readers of The Hindu, where they could glean valuable insights from the data and be able to explore it/dig deeper if they so chose.
And so, a need for a new heading
Country | Share of India's Crude Import (July 2019) |
---|---|
Angola | 2.73% |
USA | 4.16% |
Iran | 4.7% |
Kuwait | 5.2% |
Mexico | 5.5% |
Venezuela | 6.95% |
UAE | 7.19% |
Nigeria | 8.19% |
Saudi Arabia | 18.78% |
Iraq | 21.78% |
This totals to 85.18%, leaving us with an assumption that the remaining 14.82% could be attributed to 'others'
The fact that it is components of a whole pointed towards using a pie chart or tree chart. As pie charts are more widely understood, I chose to represent the data in this form.
I had originally intended to make the pie chart colourful, with the colours of the flags of different countries from where India sources its crude oil. But on reflection, I decided not to go ahead with it as the important content here is the chunk Saudi Arabia contributes, and so I decided to give greater importance to this key point of the content rather than equal importance to all source countries and having a colourful graphic which adds clutter to the intended datastory.
Using semantic colours to relate to a barrel of 'black gold' i.e. crude oil, and newsprint-like body copy as well as typography, the overall news piece looked somewhat like this:
I tried to add some more meaning by giving a brief summary of the story the body copy told, and added this as a 'tldr' (too long, didn't read :stuck_out_tongue: ) type of encapsulation so as to cater to readers who would like to know the story in short without having to delve into the verbose part of content. This was done as a series of steps which looked at events one after the other. Post these additions, I corrected for better visual clarity, legibility and contrast while maintaining the original semantic colour scheme.
The final news piece looks like this:
Although I call it the final news piece, I'm looking forward to actionable feedback and constructive suggestions to make it better and iterate further, given time.
Made with love and data, Maulashree.
You can find the original Hindu article here Story: The story highlights the changing trends seen in the composition of Indian families over almost 2 decades.
Data: The data provided in the original article is about the percentage of families with different compositions, for example:
It also gives the data about the trends in percentages of the 'couple only', 'single mother' and 'single father' families between the years 1983 and 2015.
Problems with the original visualization:
Submission 01:
Feedback on the above visualization: 3-D visualizations should be avoided wherever possible as they give a distorted sense of the scale and also to improve the data-ink ratio.
Submission 02:
Submission 03: Improving the assembly and coherence. This layout seems to provide a better coherence than a vertical layout
Submission 04: Refining the assembly:
I chose to redesign the Visualization of the article Titled - More Indians have access to drinking water, basic sanitation_ The existing article aims at showing the 'change' or the progress (to be precise) from the Year 2000 to 20117. The progress is mapped on the following parameters against the percentage of the population and further Indian progress is compared with respect to the one with the global -
The current data visualization is as follows-
The Demographic Divide - another set of visualization aims at analysing the global & Indian level data with respect to Urban & Rural scenarios. Another set of information is layered while analysing the economic status This set of data not so clearly showcase how India gets affected in terms of regional difference Analysis of the Data- ### Major thing to be conveyed - The Change is visualized with respect to % of the population from the year 2000 to 2017 Set of Information - Progress in:
Economic status The Gaps while analysing the visualization: If data could be consolidated at one place it would have been better to analyse the change with respect to other attributes - Keeping the type of visualization same i.e scatter plot with links I proposed the consolidated format the final one is below other iterations ![Uploading A4 infographics V2.jpg…]() ![Uploading A4 infographics NTS 01 bottom head.jpg…]()
The issues with this data visualization; it's challenging to decipher the different attributes through 'not so effective' encoding of graphical annotations The below image depicts the final layout, Needs to be formatted with better graphics.
To redesign the story 'Is Kashmir underdeveloped as stated by Amit Shah?'. The story can be found here.
For the data in its current form Pros: 1) Simple and easily comprehensible representation
Cons:
1) The Vertical separation of the dots not not convey additional information.
2) The positive and negatives sides are not consistent for each graph.
3) The main focus of the story gets lost in the textual information.
(The images in the iteration 1 and 3 are not complete, the decision to discard the direction was made after a quick wireframe to get an idea of what the entire article could look like)
Direction 1 Pros: Colour coding catalyses pre attentive processing. Cons: Only 3 of the 22 - 28 dots in each graph actually contain information, interval information is getting lost.
Direction 2 Pros: The rank of Kashmir is brought into focus. Cons: Too much dependence on textual information.
Post this I realised that I had to bring in the interval information i.e. the average (which was getting hidden in the text). I retained the use of an X axis to show the relative position of each state which is a simple and easily comprehensible representation.
Direction 3 Pros: Dependence on textual information is reduced. Cons: The vertical separation lets on more than there actually is.
Direction 4: Final
Thanks!
The selected data point article was How fast does traffic move in your city?
The article cites information from the research paper published by researchers from various universities in the US. The papers evaluate 154 cities in India based on two indices, Mobility Index and Congestion Factor. The mobility index incorporates the element speed of the motorised vehicle in a given city. Whereas, Congestion factor checks for Traffic Density of the city by Number of registered vehicles in the city, Population of the city, avg time delay to cover some distance.
With the provided chart, most of the important information is concealed, Moreover, the chart itself is difficult to interpret.
The data set of all 154 cities is nowhere found in the article or in the research paper. Another issue with the research is the comparison of cities like Dhanbad and Mumbai, where the population of Mumbai is about 15x of that of Dhanbad. Hence, it was decided to work on major metro cities. Other data sets like kilometers of road, traffic density, average speed in the city was sourced from different sources, though published in different years, but consistent across cities. The graphic treatment was chosen to emote a feeling of congestion, hence elements were tightly packed.
Encoding was such: From top: heights of the buildings bearing city names represent the total population of the city. The cars on the road show the traffic density of the city. Speedometer at the end represents the average speed of the city. Though the length of road shows the distance covered in the city since the average speed is higher, distance covered in a given time interval would be higher, hence the length.
Bottom: Is the Mobility index to Congestion factor chart. This provides a better comparison between the cities. According to the article, Low mobility index, higher congestion factor : Worst traffic experienced by the city. Higher mobility index, lower congestion factor: better-performing traffic. Apart from the 7 metro cities, other cities were handpicked along with the best performing city of Srinagar.
Issues: Correlation with road size and number of cars and length of roads seems to be off. Without any text, it is very difficult to comprehend. Weber's law violated: hard to find the difference between speedometers. Also, difficult to comprehend the position of the speedometer. Does the bar end at the tip of the circle or on the mid-line? The chart representation of the indices for comparison is tough to comprehend and does not reveal information.
Based on the feedback, I tried to incorporate the changes in the data point story.
The double bar charts were again difficult to comprehend. Upon discussing it with classmates, I realised that there were multiple areas of comprehension, which couldn't show a clear distinction while stitching the information together. Also, the infographic had drifted far away from the core element of the data story and have been sourcing information from sources rather inconsistent. One distinct example being the length of roads. Some cities have fairly large lengths of arterial roads that may not contribute to the traffic, hence reducing the road density and portraying a contrasting image from that data point story.
The final iteration came about stacking the two values of Mobility Index and Congestion factor which amounts to the understanding of how congested the roads could be.
Upon validating this against the previous iteration, this gave a much better idea about the congestion and clear distinctions about the conditions of the roads were perceived. It was more comprehensible to find the distinctions between the better and poor performing cities. Though I am unsure about the appropriateness of the coding for the given data. Alas, it works.
Thanks.
Datapoint link: How many students in rural districts can perform division?
In as many as 443 of 586 rural districts, less than 50% of students in Grades VI-VIII knew how to carry out basic division, the Annual Status of Education Report, 2018 revealed.
The article gives two visualizations to put forth its point, one with the scatter plot which shows the percentage of students who could read as well as do the math. The table gives the data on the number of districts with poor reading skills and poor maths skills in each state of India.
I believe that with the data that was provided in the data point, it did not do a fair explanation of how only 41% of students living in rural districts could perform division. Also with the data provided in the table, it could be seen that there could be some relationship between the level of maths skills and the reading skills. So in order to understand and visualize the relationship, I tried to plot a bar graph for the values for each state in the decreasing order. (Graph 2)
Interpretations:
Problems faced: The data is inadequate to ground the claim. Even after looking for the data related to this particular data point, I could not find anything that could help to create better data visualization.
Note: In Retrospect, as per the feedback was given to me, I strongly believe that I did not do a good job to show the intended relationship between the poor reading skills and the maths skills of the districts of India, and I should rework on this assignment to find a better story and relationship between the data provided.
Here is the original article.
The idea was to simplify what seems to be a very confusing scatter plot of percentage of women MLAs before 2000 and after 2000. References were made in the article about the increase and decrease of women MLAs in some states. However inferring and comparing information was difficult in this plot. So I decided to represent this plot as a bar graph which allowed the ease of finding a state and comparing it with other states.
During the feedback session I realized that I had no idea why the percentages were chunked as pre and post 2000. I tried finding if any major changes had happened around that time (such as election seat quotas, something related to constituencies) but realized that none had happened in India. This is just a representation of a table without a strong story . Will try to work on another story.
https://www.thehindu.com/data/india-migration-patterns-2011-census/article28620772.ece https://www.thehindu.com/news/national/reasons-for-migration-india/article28772050.ece
The Datapoint article on migration used visualizations to indicate three things:
States which host the highest number of migrants
For the assignment I chose to work on the third set of visualisations as I felt that the data could tell more about the paths taken by people. I had to first retrieve the original census 2011 data from its website. http://censusindia.gov.in/2011census/d-series/D-2/DS-2800-D02SC-MDDS.XLSX This included granular responses from each state, on gender, reason for migration, time of migration and state of origin. I chose to add year of migration data to the data used by the Datapoint visualisation. As all the data was in counts, it had to be converted to percentages. Population data did not make sense as numbers as larger states would logically hold a higher number of migrants. Data on the population of states from the 2011 census was used to find what percent of the states population had migrated in and how recently had they done so. I was unable to find data on NCR that reflected the Datapoint article indicating that a significant population migrates to Delhi.
The data visualisation I made represents three sets of data:
As population sizes of states vary a lot, I chose to work with the data of the 15 largest migrant populations by state. The states are arranged by decreasing size of migrant population. Blue represents the local population while red from above represents the influx of migrant population. A dashed line represents the national average of migrant population.
Version 1: From this visualisation it is easy to tell that states like Maharashtra and Kerala host more migrants per capita. It also became apparent that some states like UP and Bihar have a lower than average number of migrants. These happen to correspond with the states that have a large population migrating out. States were not arranged according to geographic location and as a result there is no hint where the migrants to a state are coming from. Feedback suggested that the colours were making it difficult to tell the composition of the population. The placement of the migrant population from the top was counter-intuitive to some as it is expected that columns are read as heights starting from the bottom. Some people were able to visually relate the heading of the article with the visualisation of migrant population placed in proximity.
The second iteration of the visualisation was made to quickly incorporate feedback on the intuitiveness of the graph. The layout of the article was changed to portrait to maintain proximity between the heading and the visuals in the graph. Colours were changed to highlight only the migrant population in shades of red.
Version 2:
A final attempt was made to include data on all states. States were split into two sets based 50th percentile of migrants when states were arranged in ascending order. The first visualisation keeps column widths true to state population sizes. The second introduces a minimum width to make the columns of smaller states readable. The third removes population size as a dataset, having uniform width across columns. This makes it easier to compare state compositions. Version 3:
This graph makes it clearer that some of the smaller states like Goa and Chandigarh host more migrants per capita than larger states like Maharashtra.
Why do Indians migrate?
The data point article I picked up for designing point to Reasons why men and women migrate in India? The data point is presented as a table which list out the percentage of men and women migrating for work, marriage and education.
Critique of the original visualisation
The immediate parameter that gets your attention is the percentage of women going out for education which <1 for all the states. Rest of the data needs more attention to comprehend.
The heading is generic while the data specifically point to gender disparity in migration pattern, which could have been clearer in the title.
The chart tries to draw some insights through a frame and blocks of colour, which has not been very effective.
Data viz direction
The first attempt was to 'make sense' of the data using a plotting technique. A connected dot plot seemed appropriate for 1 ~ shows connection between the genders, 2 ~ distance they travelled, 3 ~ pattern recognition is easier.
Migration for work
Migration for marriage
Migration for education
Final layout
Some advantages of this visualisation
As part of the assignment I picked up this article. The article is a progress report of a government-commissioned independent study of 97 towns along the Ganga shows that 39% of these towns in five States are in need of overall improvement in cleanliness, solid waste management and a change in how nullahs (drains) are handled.
The article talks about the division of towns along the ghats of river ganga. The article contains 2 parts which talks about grading of these towns among the five states: Uttarakhand, Uttar Pradesh, Bihar, Jharkand, West Bengal. These states share the largest part of the river. Among these the state of Uttar Pradesh is the most populated state in India. And this article reveals interesting data about the amount of towns which needs cleanliness management. Jharkand is a state which shares less area of the river.
The first part talks about how many cities are graded either A or B or C. These grades define about how much cleanliness is required for the percentage of the towns mentioned.
Following is the bar graph used to visualise this data:
The second part of the article talks about the river dumps by visualising the number of nullahs(drains) that flow into the river among the same states.
It is common for nullahs to drain into the Ganga across towns in all the States. In Bihar, the towns had dumpsites along the river as well.
The below visualisation shows the percentage of towns in each State that:
So after reading the article the 1st part which explains the percentage of towns spread along the river banks is explained in the form of bar charts with percentages. I thought it's better to give the readers an idea of what is the size of each state. Both the parts in the article are visualised linearly while they both have a connected explanation. The state of Jharkand shares less area of the river water and it had 100% of Grade B(Partial cleanliness around the ghats) towns. The same state had no towns with solid waste floating on the surface. So my first approach included a visualisation which had the geographical illustration of the river and the states and the percentages in line graphs near the states itself.
After the discussion with the class I noticed that I'm using less canvas area for the visualisation. So I re thought how I can show the sheer area of the states and the effect of population on the pollution.
For this assignment, we'll use data stories from The Hindu Data Point.
Select a story that you like, study it carefully and redesign it. Specifically I want you to focus on understanding the data that powers the story, and how it is visually encoded to tell the intended story. Document your design process, capturing the following:
You may choose to expand or curtail the scope of the data used in the story, or add an additional dataset to tell the story better. But do not deviate from the main intent of the original story. In other words, it is a redesign exercise, and hence I do not want you tell a different, unrelated story.
While you should provide a link to the original story, it might be useful to capture and display inline, appropriate parts of the original visualization, and your own design iterations to produce a coherent documentation.