info-design-lab / DE705-Interactive-Data-Visualization

Documentation of the IDC M.Des course Interactive Data Visualization, 3-20 Sep 2019

2 stars 0 forks source link

Redesigning The Hindu Data Point Stories (2019) #1

Closed venkatrajam closed 4 years ago

venkatrajam commented 5 years ago

For this assignment, we'll use data stories from The Hindu Data Point.

Select a story that you like, study it carefully and redesign it. Specifically I want you to focus on understanding the data that powers the story, and how it is visually encoded to tell the intended story. Document your design process, capturing the following:

What is the story the author is trying to tell?
What the data he/she is using to tell the story? Describe its details -- type of data, extent of the data, dimensions of the data, gaps in the data, what data is essential and what is irrelevant.
How is it encoded, probelms with it and how you attempted to improve it.

You may choose to expand or curtail the scope of the data used in the story, or add an additional dataset to tell the story better. But do not deviate from the main intent of the original story. In other words, it is a redesign exercise, and hence I do not want you tell a different, unrelated story.

While you should provide a link to the original story, it might be useful to capture and display inline, appropriate parts of the original visualization, and your own design iterations to produce a coherent documentation.

rishivanukuru commented 5 years ago

Roger Federer - Aged Like Fine Wine?

Last Edit: 15/09/19 - Incorporating responses to Venkat Sir's comments

For this re-design, I picked an article titled 'Roger Federer's career shows he's ageing like fine wine', written on the 8th of March, 2019 (right after he won his 100th ATP title). It was a visualization of Roger Federer's career performance in terms of titles won, when compared to the 15 men's tennis players with the most titles. The original article can be found here.

The article claimed that Federer has been performing better as his career has progressed, relative to the other players in the top 15. It did so by comparing a single metric - Titles won/Tournaments entered - for each player across three stages of their career. The information was presented through simple dot plots made in Flourish. The plot for Stage I (Years 1 to 7) looked like this -

DP_1

The percentage of wins was encoded in triplicate - On the Y Axis, in the size of the dot, and the shade of the dot as well.

In terms of interactivity, readers could view the position of one player at a time using a drop down menu in the top left. By hovering on a dot, readers could see information about the player, number of wins, and win percentage.

Having followed tennis fairly religiously since I was 7, I felt that the approach towards player performance was overly simplistic at best, and at times very misleading.

Just to clear things, my mood in life is directly proportional to the extent of Federer's successes at any given point of time. That being said, I don't think he has gotten better over time. What's amazing about his career is how he has bounced back from multiple almost-career-ending spells, and is still performing consistently at the highest level of the game.

The biggest problem I had with the visualisation was how the longevity of a player, and consistency over time, was not being adequately expressed. In fact, in the third graph, I felt it went ahead and displayed the opposite insight, by making it seem that Djokovic and Nadal had reached the level of Federer, but with much greater pace (Image below). This is most clearly not the case.

DP_2

And so I started off by collecting and compiling more detailed statistics for each of the 15 players in this visualisation. I managed to find the number of tournaments entered and titles won in each year of each of their careers, and put it all together in a spreadsheet that looked something like this -

sheet

Of the 15 players included in the original title, I excluded three - Rod Laver, Ilie Nastase and Guillermo Vilas. These three all played some portion of their careers before the Open Era, and a number of tournaments that they played in at the start of the Open Era are no longer recognised by the ATP, and I wasn't able to find accurate statistics for them.

Even a cursory glance at the image of the spreadsheet above depicts how it doesn't make sense to compare all players across the three stages of their career. Federer has been active in the tennis circuit for 22 years now. Only four players of the remaining 12 are active, and considering Andy Murray's recent debilitating history with injuries, it seems unlikely that his third act will produce many more titles. Among the players who have retired, only Jimmy Connors, John McEnroe and Andre Agassi have had career lengths comparable to Federer. Initial explorations with graphing all 12 players together led to some very noisy and unreadable graphs.

For the purpose of this redesign, I decided to pick -

Roger Federer (for obvious reasons)
Jimmy Connors (All time leader in titles won, longest career span)
John McEnroe (Significant number of wins in the third stage)
Andre Agassi (Again, significant number of wins in the third stage)
Rafael Nadal (Currently Active, looks like he will have a much longer career, high output in stage 3)
Novak Djokovic (Also currently Active, looks like he will have a much longer career, high output in stage 3)

I started by plotting the same metric used by the original article (titles won/tournaments entered) for each of these six players across the three stages of their careers. This is depicted below -

EfficiencyMetricTest

These graphs did make the distinction a little clearer. One can easily see how Federer's efficiency metric has been much higher than any other player consistently over stage 2 and stage 3. But this still felt inaccessible to people without a clear understanding about how the tennis season is structured. I then tried just representing wins and losses at tournaments across Federer's career through a stacked bar graph -

FedererCareerBar

This to me seemed like a nicer way of depicting things. It gives readers a visual sense of the total number of tournaments played at each step of a players career, and how many of those tournaments were converted into titles.

I then made a continuous graph (instead of bars, for purely aesthetic reasons) for each of the six players in consideration.

I also figured it would be nice to have player graphs correspond to certain known characteristics about them.

Federer is the most prolific grass-court player in the Open Era, and so he's depicted in green.
Nadal has been unreasonably good at clay court tournaments, and so his graph is orange/brown
Djokovic went ahead and conquered the third court type - Hard Courts. He is depicted in blue
Jimmy Connors has the longest career and most number of titles, and is coded in purple - the colour of royalty
John McEnroe is old and often angry, and hence gray
Andre Agassi once wore a blonde wig on court, and so yellow

The initial iteration of the article that was discussed in class looked like this -

Tennis Data Viz Submission I

The career stage graphs on the right are all aligned to the start of each player's career, and divided by guide lines into the three stages considered by the original article. Textual information was included for the number of tournaments entered and titles won for each career stage. Another graph, depicting the cumulative titles won by each player over the course of their career was included as well.

There were a few issues with how the information was presented in this. It was unclear whether players were retired or currently active, as all graphs were brought to zero in the end. A number of the labels were not very readable too. These have been fixed, and the current version of the article looks like -

Tennis Dat Viz - Final

Addition 1

Changes to the article:

The grey value of the tournaments played graph has been made darker
The sources are explicitly mentioned as data sources
The 'H' key for the height has been removed. Instead, the body text has an explanation for the same, and the highest point among all players (28, Jimmy Connors) is explicitly marked. I didn't mark the same for each player as it looked a little cluttered. Will have to think of a better way of depicting it
The labeling of the stages has been corrected
Additional encoding in the form of the win percentage for each career stage for each player has been included. Their ATP rankings have fluctuated and cannot be compared across the time periods, especially in the case of Djokovic, Nadal and Federer, who have traded the top rank so many times in the past decade that their average would come to be less than say Connors or Agassi, when in fact they were all playing at an equally high level for most of the time. I'll try and think of a more visual way of representing the information that a win percentage conveys in further iterations of this visualization.

The newer version of the article is included below -

Tennis Data Viz Update 1

Thanks to Venkat Sir for pointing these mistakes out, and for the overall feedback as well.

Further Responses

Order of players

On the whole, the order of the 6 players in the main article was determined as follows -

The article is about Federer, so he should come on top
Jimmy Connors has the most number of titles, and longest career, so he should come second
Andre Agassi and Rafael Nadal have the next longest career spans, and they follow in that order
While McEnroe has more titles and has played for longer than Djokovic has, Djokovic's time is more recent and it would be interesting to compare his performance with a direct competitor in Nadal, and so Djokovic was placed above McEnroe

The meaning of 'Consistency', in the context of this analysis

My interpretation of consistency was informed by the broader analysis of 12 Open Era players. The expanded version of the visualization is included below. The original 6 are ordered as is, and the next 6 are in the order of decreasing career lengths.

Tennis Data Viz Long Graph

Now the drop-off in the case of the next 6 players is very clear when presented visually. They have all played far fewer tournaments in the third stage of their careers, and have won even lesser tournaments than those in the initial visualization.

Consistency in this case is a subjective, visually judged measure. I think it would be possible to quantify the same in a better way, will have to think about it. But if I had to split the measure, the three parts to it would be (obviously) -

Number of tournaments played in general
Number of titles won
Win Percentage

For example, Thomas Muster played a staggering number of tournaments in his stage II (182), but really didn't win that high a percentage of them. Soon after, in stage III, his output and efficiency quickly decreased even further. Hence between stage 2 and 3, Muster was not quite consistent, nor efficient.

Now going back to the initial 6, Federer's 'consistency' comes from the fact that he has played much more tournaments in the third stage of his career than anyone with the exception of Jimmy Connors. However, Connors' win percentage was terribly low throughout his stage III, while Federer at this point in time has the highest win percentage among any player, even in the longlist. This combination of longevity and high win percentage makes his trajectory look more consistent over the 3 stages.

The same can be said for his contemporaries, Nadal and Djokovic, who have maintained similar win percentages in their third stages, and show no signs of dipping in the near future. The dominance of Federer, Nadal and Djokovic over the last two decades has been unprecedented, and on an individual level, their output has far exceeded any tennis player before them. While they haven't played for as long as Connors, they have managed to keep winning the biggest tournaments of the sport at a point where Connors was merely participating in his career. So in a way, the article could just as easily have been about the great performances of Nadal and Djokovic. Maybe some other author will write it for them. I respect what they have achieved in the sport, but this is me doing my bit and preaching the gospel of Federer.

That's all for now, Rishi

gyanl commented 5 years ago

Is Kashmir under-developed?

Link to original article

What is the story the author is trying to tell? India's Minister of Home Affairs, Amit Shah, claimed that section 370 of the Indian constitution "...ensures the healthcare in Jammu and Kashmir suffers, no doctor wants to go there. 370 ensures there is no right to education for the children of Kashmir." and "the entire country is developing, but when we look at Kashmir, we get tears in our eyes that even after 70 years (of independence), they are still living in poverty."

The articles takes a look at how the state of Jammu and Kashmir fares on various indicators of growth and development compared to the other states of India to see if there is merit to these claims. By looking at factors like people served per govt. doctor, life expectancy, poverty rate, etc, we can see that J&K was mostly doing better than average, and on some metrics like life expectancy, it is towards the top.

What data are they using to tell the story? The data used in the story is quantitative data about the performance of various states on different metrics. The data is sourced from different sources, and is for different years or year ranges, ranging from 2011-2018. Some data sources did not provide information for every state, and hence the number of states also varies from 22 to 30 - which is a little confusing as there are 29 states in India. The information presented in the article is in the form of charts which discard the names of all states except J&K, but the accompanying text also specifies which state is performing the best and the worst on the metric.

datapoint How is the data encoded? The data was originally encoded as 7 different charts. Each chart used points on a 1 dimensional scale, with J&K's dot colored red, and all other dots in orange. The charts marks how many states are better off and how many are worse for each metric.

What are the problems with the encoding?

In order to avoid overlapping circles, the article displaces the dots randomly on the y axis, leading to the appearance of a scatter plot even though the y axis has no meaning.
The article discards the names of all the states except for J&K, meaning there is no way to compare states.
On some charts lower is better, while on others higher is better, so there is no consistency between how to read the 7 separate charts.

How did I attempt to improve it?

Combined information into 1 chart to get an overview of the information being presented at a glance.
Change information from ratio to ordinal, since exactly how well each state is doing not very important and is harder to show, while rank is easier to compare across metrics and gives a sense of the status of J&K w.r.t. other states.
Changed the directions of some of the metrics so that right is always better, adding color coding from red to green to make this clearer.
Added the best and worst performing states to the charts.
Center aligned every metric to the average rank ( i.e. median position on chart)

Feedback from class discussion

Is there a way to avoid the loss of information while going from ratio to ordinal?
Title and explanation of the infographic could be improved.
It's not very obvious that this data is for different years.
'Average rank' is median.
See if aligning to top or bottom makes it easier to read.

Vertical layout

j kvert

venkatrajam commented 5 years ago

Roger Federer - Aged Like Wine?

Good document! Some quick comments:

The grey chart (tournaments played) has poor contrast with the background. Increase the value a bit so we can at least see it (detection).
Not clear if the 'title trajectories' chart itself is from Wikipedia, or just the data for it.
What is the encoding behind the order (top to bottom) in which the payers appear?
As there are potentially many ways someone can describe and understand what consistency means, it is useful to explain how & why you choose to define 'consistency' in your story, and how your viz design supports it.
'H' is trivial and takes me out of the graphic for no good reason. Integrate it in the visual, perhaps in a more effective manner. For example, the highest point (the most tournaments played in a year ever in the career) for each player can be marked and labelled directly.
All stages are labelled as ‘Stage I’.
You could consider some additional encoding to directly compare the proportion of wins in each of the 3 stages of player careers, ATP ranking or other measures to reinforce the story.

akshayrpatil commented 5 years ago

What is the status of Smart City projects in India?

Original Article link click here

The story they are telling through article

In this article, the author tried to give a status report of the smart city mission. In June 2015 Indian government had launched 100 smart cities mission to provide better infrastructure, expand housing to all, and developed open spaces, good governance in cities. The article was written five years after launching this mission. They analysed the gap between the approved budget and money spent on smart city projects by various cities.

Major insights of this article

It tells 5151 projects initially proposed, and 3,629 have been actively pursued.
Budget analysis - 48,000 crore of the fund is approved between 2015 to 2019 and half of that been allocated to cities. And only 1,700 crores of the allocated fund has been spent.
The article also gives some prominent figure such as the number of smart cities in each state, the number of projects completed by smart cities. Average of completed projects by states and the average cost of completed projects per cities in states.

The data they are using to tell story

The data used to make this data story is mainly from smart city websites(maybe). So I tried looking for the data but could not find anything related to reports on smart city website. But I got another report from MoHUA stating figures and data related to the smart city link, page no 189

How is the data encoded and problems with encoding

They used a 3D doughnut pie chart to visualize data but did not mention the source of data as well as the legend to chart.

.. There is long text para to give insight about some important statistical figures about funds and budget, Which he also visualized in bar graphs without naming the graph

.. At last, he concluded his article with XY axis graphs, which shows a comparison between average number completed projects per city to the average cost of the completed projects per city. He used 3 parameters here:

A number of a smart city in the state - the size of the circle indicates the number of smart cities in that State in the graph.
An average number of completed projects per city.
The average cost of completed projects per city.

Complexity in this visualization: Cost of the projects and number of completed projects are two independent variables, comparison of both in the form of a graph with additional third parameter (Number of smart cities in States) is difficult to understand for the common readers.

.. Redesigning

I started with reading and finding important information in this article. In this article, some of the important statistical figures about the smart city were just written in text format and need to be presented in a better way and highlight it.

-Identified information-

Smart city projects insights
Budget figures
Number of smart cities in each state

The above information can give insight and the number of projects initiated under the smart city mission.

The budgetary information needs to be visualized in a manner to give a comparison between the budget allocated and the budget spent on projects.

I separated information of "number of smart cities in the state" from the "average number completed projects per city" vs "average cost of completed project per city" for simplifying the visualization.

venkatrajam commented 5 years ago

What is the status of Smart City projects in India?

"Insides" or "insights"? The document is full of typos, grammar, punctuation errors. I mentioned this several times during the past 1.5 years. There are several tools available to produce reasonably error free documents. If you want to be taken seriously, not just here but anywhere, this would be a minimum requirement.
The writing is not clear. Think about what you are saying, read it back to check if what you've written says it. We discussed and I am familiar with your solution. Someone who is only reading to understand your work won't have that benefit.
Crop the ads out of one of the screen captured images.
Remove the section on 'further steps'. Let's keep it as a documentation of what one did, and not what one could potentially do.

mayura7 commented 5 years ago

Here is the original Hindu Article. Story The story aims to highlight that Moon lander missions are difficult than Moon orbiter mission.

Data In form of evidence to corroborate the story, the author gives the success rates of countries who have tried to succeed. Along with the missions a timeline is also given and specific countries are highlighted for their spectacular success (China), as well as their huge struggle (USSR).

Details Type of data: Bi-nominal (Successful/ Unsuccessful) with the rate of success in ratio scale which makes it a mix of qualitative and quantitative. The visualization falls between the categories Declarative and Data driven and hence qualifies as Visual confirmation. General knowledge about several countries' missions is given which makes the data set more interesting. Below is the visualization used to depict the story, followed by 3 charts that comprise the available data:

Gaps in the data: The rate of success for the different kinds of moon missions (Lander/ Rover/ Orbiter/ Sample return) is given in percentages and in the same chart even though the data is in small samples. This led to complication in calculating the exact number of successful and unsuccessful missions of different purposes which is why the story does not sell. The reader gets confused in the tables and percentage values when all s/he is looking for is the proof about how Moon lander missions are more difficult. The message is delivered through data but only text. The charts have confusing titles which increase cognitive load as the reader has to reverse calculate the figures. Visual Encoding: Red and green circles have been used to show unsuccessful and successful missions on a timeline. Problems: The story is fragmented and the viewer has to go back and forth to understand the message. The 2 charts with percentages are exhaustive and even after reading those percentages, there is grasp of number of successful/ unsuccessful missions for lander and orbiter missions. The timeline just indicates two time spans- 1965-1975 and then 2010- 2019 with missions mapped on it. However, there is no clear mark of which particular year the mission belonged to.

The Info-graphic Objective was to make the story comprehensible in a quick glance and visually appealing. I removed the charts and used two concentric Donut diagrams around the moon- each representing the success rates of Lander and Orbiter missions. The diagrams showed the success rate of individual countries (for lander missions) and overall rate. Colour coding is used for the donut diagrams. The ideas is that the viewer will be able to immediately see the difference in red-green portions and the message will be delivered. Successful mission is depicted as shades of green and Unsuccessful mission as red. I discarded the timeline and showed the country specific missions using variation of size (for the flag icons) and the border of the icons depicted whether successful or unsuccessful.

Hindu-Moon-Missions-01

The feedback session helped me realize that the concentric circle representation is not the best way to compare lengths and hence values. I should have removed the percentage values and consistently used the number of missions in each category as the numbers were quite small in my data set.

Feedback Incorporation: I reverse calculated the Hindu Point data which gave me the number total successful and unsuccessful missions. I brought the two rings closer and added marks on them which made it easier to compare the rate of success and failure. Final Infographic: moon-Mission-iteration

avyayrkashyap commented 5 years ago

A World within a Country

Here is the original article.

The article's aim is to be show how big the Indian electorate is and compare it to some of the democracies of the world.

It starts of by showing the growth in the electorate since 1952.

They also mention the percentage of voter turn out over the years starting from 1962.

These visualisations needn't have been separate and could have been combined into one, letting the user explore correlations between the no. of voters and turn out. Also, we don't get a sense of how many voters are actually voting with these two separate visualisations.

There's also a section about the number of candidates contesting, with a line graph to represent the percentage of women candidates. While this was an interesting graph, with a lot of areas that could be improved, I felt it did not add to the story of showing how big India's electorate is, so I decided to omit it from the final visualisation.

The final visualisation of the article shows how the electorates of the various Indian states compare with other democracies of the World.

The visualisation gives an approximation of what countries would could best replace the Indian states if only the electorates were considered. So, for example, Uttar Pradesh has an electorate that is roughly the same size as that of Brazil. The problem with this is it doesn't tell anything to the reader. To start with, I have no idea about the size of Uttar Pradesh's electorate, and I am being told that this is the same size as the electorate of Brazil. Also, the comparisons are being made with countries that don't have any relation with the state in discussion. Jammu & Kashmir is compared with Madagascar, which doesn't quite make much sense. I tried to remove this notion by getting rid of a choropleth and instead have a Sankey. This ensured that the only comparison would be between the electorates of the states and countries without adding confusion.

In order to be able to make the Sankey, I needed to find the numbers that make up the electorate. For the Indian states, thankfully, Wikipedia had a neat documentation under their article on the Indian Elections. For finding the electorates of other countries, I had to look a bit before I found this wonderful website.

The ordering of the Sankey is arbitrary, as is the colour choices. This was the limitation of the software I was using at the time. I have also combined the growth in the size of the electorate and voter turn out into one visualisation to allow for comparison.

ElectorateViz-01

There were quite a few areas to improve with this visualisation. To start with, the percentage of voters line could have been more exaggerated, as done in the original visualisation to highlight the ups and downs in the voter turn out. The Sankey was too small and for the given space, there were other better ways of visualising, while retaining the choropleth characteristic used in the orginila visualisation. The Sankey was too arbitrary and could have done with some amount of logic, with respect to the ordering and use of colours. Also, the smaller states, such as Nagaland and Tripura are hardly visible.

So on to fixing this...

Now I really wanted to fix everything, really. But somethings require greater attention than initially thought. I decided to focus on the fixing the Sankey. I wanted to make the representation more relevant. While the Sankey showed how the electorates translated across to different countries, it was still flawed in terms of the arbitrariness of the flows. Also, there was no inherent order in which either the states or the countries appeared. And the other pressing point was the lack of relatability to some of the lesser known countries visualised.

Just as the Indian electorate's choices are represented in Lok Sabha, I decided to represent the relation of the states to the countries through their representation in the Lok Sabha. At the same time, I decided to fix the issue of having to look up countries such as Timor-Leste.

I changed the data set to 10 well known democracies that the average Indian reader would've come across. I took the size of the electorates of each of these countries, and normalised them to get the magical number of 543 seats.

I then visualised the Indian states' allotted constituencies on one side. Arranged them alphabetically, and split the legend either side of the chart depending on its proximity to the location of the state's dot. This might be slightly contentious, but 31 different colours would've been quite a task. Right below this, I placed the representation that the countries would have if they had to make up the Indian Lok Sabha.

ElectorateVizFinal-02

So that I think would be my final visualisation of translating the Data Point article.

Cheers, Avyay.

dhirajdethe commented 5 years ago

How Much Mobile data do Indians use in a Month?

The original article can be read on this link.

The article is trying to tell a story of the trend of mobile data usage in India in the past four years using data mined from the report of TRAI titled Wireless Data Services in India published recently. The author gives emphasis to show how much mobile data do Indian individual uses in a month and how these numbers have been changing from 2014 to 2018.

The author tries to articulate the story using three graphical visualizations. First, to show the change in the number of mobile data subscribers according to different connectivity technologies(CDMA, 2G, 3G, 4G) throughout the last five years. Second, to show the effect of change in cost on mobile data consumption per month per subscriber. And third, to show the number of data subscribers and average data consumption per month in different service areas.

For the first visualization, an area chart is used to show the number of mobile data subscribers for four types of connectivity technologies, i.e., CDMA, 2G, 3G & 4G. The area chart is inherently difficult to comprehend when the intent is to show the change in numbers and not just to show something is changing over time.

Screenshot (109)

To show how the number of subscribers changed over time, I redesigned the visualization using the line chart, with the same dataset. The lines are encoded with different colors for different types of connectivity and legend is provided to refer the encoding. With these changes, now it is easy to understand the trend in the use of Mobile data of these connectivity technologies for each year from 2014 to 2018.

Assignment1 data point-03

The second graph, which shows the trend in the amount of mobile data used with respect to the decrease in the cost of mobile data per GB, used the line graph on top of the Bar graph. Although the trend shown in this graph is easy to understand, the graph itself is not necessary here to understand this trend. It is obvious that decreasing rates result in an increase in mobile data consumption.

Screenshot (110)

For the third visualization, the scatter plot is used to show the amount of data used per month vs. no. of subscribers in each service area. From the scatter plot, it is difficult to understand the two attributes (amount of data used per month and the total number of subscribers) of each service area.

Screenshot (111)

I transformed the same scatter plot to a Bar graph with x-axis plotted with two bars (Red & Green) for each service area, Red for a number of data subscribers in Millions and Green for the amount of Mobile data consumed by individuals per month in GB. The two Y-axes show the value of two bars encoded with respective colors.

Assignment1 data point-04

The First Submission

Assignment1 data point-01

Feedback

The bars used in the graph to show the total number of mobile data users don’t really help to convey the intent. It will be better to show it using a line graph only instead of bars.
The bar graph used for service areas does not lead to any conclusion. This graph neither complements the story in showing any trend in data usage nor helping to compare different service areas with respect to their data consumption.
It would be interesting to see how much and which mobile data do Indians use in a month, which is actually the intent of this story.

Iteration

Added a line graph for the total number of mobile data users.
Discarded the second bar graph.
Looked at the data given in the TRAI report, used a dataset of technology-wise data usage per month and plotted it as a line graph. The graph shows the amount of mobile data used per individual per month in GB.

Final Submission Assignment1 data point-02-02

maulicule commented 5 years ago

What the Parliament has been Discussing

The original Hindu article can be found here As is the case with technical problems, they crop up at the most crucial moments.

Laptop crashed. Lost initial version, will update as soon as I reconstruct it.

What I do have, though, is a PDF version I'd used for printing, and the valuable insights gained from discussion in the class. The PDF is right here: The_Hindu_RedesignArtboard 1 copy 2.pdf The major feedback I received on this was that the 'violin' chart wasn't adding as much meaning to the overall information to be conveyed, and the larger graphic tried to cram a lot of information but didn't quite tell a story. I was also missing data points and there was no available reference to the source data, which was a limiting factor. There was poor contrast in some of the gridlines, especially apparent in print. After the course, though, there were even more factors I realized I had initially not taken into consideration - dealing mainly with detection, assembly and estimation as a part of pre-processing rather than as an afterthought.

Trying the assignment afresh might be a good idea

In any case, I would need to make the infographic again.

Will need to reconstruct from scratch to be able to submit a workfile.

Another article, instead?

Since I'm working from scratch here anyways, thought I could try out another article from Data Point which seemed pretty interesting from the data perspective, but alas for readers of The Hindu, not visualized particularly well. Here's the 'new' article, which looks at crude oil, the recent price spike (all time high in the last 19 years), and India's oil imports. The particular graph which had source data available deals with India's oil import and looks somewhat like this: My task would be to treat it as a data story for the readers of The Hindu, where they could glean valuable insights from the data and be able to explore it/dig deeper if they so chose.

And so, a need for a new heading

India's Oil Imports

Country	Share of India's Crude Import (July 2019)
Angola	2.73%
USA	4.16%
Iran	4.7%
Kuwait	5.2%
Mexico	5.5%
Venezuela	6.95%
UAE	7.19%
Nigeria	8.19%
Saudi Arabia	18.78%
Iraq	21.78%

This totals to 85.18%, leaving us with an assumption that the remaining 14.82% could be attributed to 'others'

The fact that it is components of a whole pointed towards using a pie chart or tree chart. As pie charts are more widely understood, I chose to represent the data in this form.

I had originally intended to make the pie chart colourful, with the colours of the flags of different countries from where India sources its crude oil. But on reflection, I decided not to go ahead with it as the important content here is the chunk Saudi Arabia contributes, and so I decided to give greater importance to this key point of the content rather than equal importance to all source countries and having a colourful graphic which adds clutter to the intended datastory.

Using semantic colours to relate to a barrel of 'black gold' i.e. crude oil, and newsprint-like body copy as well as typography, the overall news piece looked somewhat like this: CrudeOilAsset 1

I tried to add some more meaning by giving a brief summary of the story the body copy told, and added this as a 'tldr' (too long, didn't read :stuck_out_tongue: ) type of encapsulation so as to cater to readers who would like to know the story in short without having to delve into the verbose part of content. This was done as a series of steps which looked at events one after the other. Post these additions, I corrected for better visual clarity, legibility and contrast while maintaining the original semantic colour scheme.

The final news piece looks like this:

20190920_2c_the_hindu_data_point_crude_oil_at_19_year_hig

Although I call it the final news piece, I'm looking forward to actionable feedback and constructive suggestions to make it better and iterate further, given time.

Made with love and data, Maulashree.

GauriTillu commented 5 years ago

You can find the original Hindu article here Story: The story highlights the changing trends seen in the composition of Indian families over almost 2 decades.

Data: The data provided in the original article is about the percentage of families with different compositions, for example:

Couple Only
Extended family, etc.

It also gives the data about the trends in percentages of the 'couple only', 'single mother' and 'single father' families between the years 1983 and 2015. Screenshot (641)

Problems with the original visualization:

The two visualizations are not coherent, The relation between them is not clear.
It does not make sense to use a bubble chart for percentages.
The single parent graph in the second chart does not add any value s it is just a simple addition of the percentages of single mother and single father.

Submission 01:

Feedback on the above visualization: 3-D visualizations should be avoided wherever possible as they give a distorted sense of the scale and also to improve the data-ink ratio.

Submission 02:

Submission 03: Improving the assembly and coherence. Gauri_Nature_of_Indian_Families_02-01 This layout seems to provide a better coherence than a vertical layout

Submission 04: Refining the assembly:

aishaanam commented 5 years ago

I chose to redesign the Visualization of the article Titled - More Indians have access to drinking water, basic sanitation_ The existing article aims at showing the 'change' or the progress (to be precise) from the Year 2000 to 20117. The progress is mapped on the following parameters against the percentage of the population and further Indian progress is compared with respect to the one with the global -

availability of drinking water,
sanitation,
drinking water &
hygiene

The current data visualization is as follows-

availability of drinking water-

Screenshot (378) The Demographic Divide - another set of visualization aims at analysing the global & Indian level data with respect to Urban & Rural scenarios. Screenshot (372) Another set of information is layered while analysing the economic status Screenshot (371) This set of data not so clearly showcase how India gets affected in terms of regional difference DAta Viz Info 1 (4) Analysis of the Data- ### Major thing to be conveyed - The Change is visualized with respect to % of the population from the year 2000 to 2017 Set of Information - Progress in:

Availability of drinking water,
improvement in sanitation
practice of hygiene
reduction in the practice of open defecation Layers of Information There are multiple layers of information which is to analyse how the change is affected by parameters like -
Demographic Divide - Urban & Rural scenarios
Economic status The Gaps while analysing the visualization: If data could be consolidated at one place it would have been better to analyse the change with respect to other attributes - Keeping the type of visualization same i.e scatter plot with links I proposed the consolidated format the final one is below other iterations ![Uploading A4 infographics V2.jpg…]() ![Uploading A4 infographics NTS 01 bottom head.jpg…]()

The issues with this data visualization; it's challenging to decipher the different attributes through 'not so effective' encoding of graphical annotations The below image depicts the final layout, Needs to be formatted with better graphics.

prachitank commented 5 years ago

Is Kashmir Underdeveloped

To redesign the story 'Is Kashmir underdeveloped as stated by Amit Shah?'. The story can be found here.

For the data in its current form Pros: 1) Simple and easily comprehensible representation

Cons: 1) The Vertical separation of the dots not not convey additional information. 2) The positive and negatives sides are not consistent for each graph.
3) The main focus of the story gets lost in the textual information.

(The images in the iteration 1 and 3 are not complete, the decision to discard the direction was made after a quick wireframe to get an idea of what the entire article could look like)

Direction 1 Pros: Colour coding catalyses pre attentive processing. Cons: Only 3 of the 22 - 28 dots in each graph actually contain information, interval information is getting lost.

Direction 2 Pros: The rank of Kashmir is brought into focus. Cons: Too much dependence on textual information.

Post this I realised that I had to bring in the interval information i.e. the average (which was getting hidden in the text). I retained the use of an X axis to show the relative position of each state which is a simple and easily comprehensible representation.

Direction 3 Pros: Dependence on textual information is reduced. Cons: The vertical separation lets on more than there actually is.

Direction 4: Final

Thanks!

kaishwary08 commented 5 years ago

The selected data point article was How fast does traffic move in your city?

The article cites information from the research paper published by researchers from various universities in the US. The papers evaluate 154 cities in India based on two indices, Mobility Index and Congestion Factor. The mobility index incorporates the element speed of the motorised vehicle in a given city. Whereas, Congestion factor checks for Traffic Density of the city by Number of registered vehicles in the city, Population of the city, avg time delay to cover some distance.

With the provided chart, most of the important information is concealed, Moreover, the chart itself is difficult to interpret.

The data set of all 154 cities is nowhere found in the article or in the research paper. Another issue with the research is the comparison of cities like Dhanbad and Mumbai, where the population of Mumbai is about 15x of that of Dhanbad. Hence, it was decided to work on major metro cities. Other data sets like kilometers of road, traffic density, average speed in the city was sourced from different sources, though published in different years, but consistent across cities. The graphic treatment was chosen to emote a feeling of congestion, hence elements were tightly packed.

Encoding was such: From top: heights of the buildings bearing city names represent the total population of the city. The cars on the road show the traffic density of the city. Speedometer at the end represents the average speed of the city. Though the length of road shows the distance covered in the city since the average speed is higher, distance covered in a given time interval would be higher, hence the length.

Bottom: Is the Mobility index to Congestion factor chart. This provides a better comparison between the cities. According to the article, Low mobility index, higher congestion factor : Worst traffic experienced by the city. Higher mobility index, lower congestion factor: better-performing traffic. Apart from the 7 metro cities, other cities were handpicked along with the best performing city of Srinagar.

InfoGra Traffic-01-01

Issues: Correlation with road size and number of cars and length of roads seems to be off. Without any text, it is very difficult to comprehend. Weber's law violated: hard to find the difference between speedometers. Also, difficult to comprehend the position of the speedometer. Does the bar end at the tip of the circle or on the mid-line? The chart representation of the indices for comparison is tough to comprehend and does not reveal information.

Based on the feedback, I tried to incorporate the changes in the data point story.

Roads are of the same length, with varying road densities of the cities. The data for road length was borrowed from e-Newspaper and govt. sources. Whereas the number of cars registered in the city was sourced from the second chart of the referenced story.
A clear distinction for average speed reflected on the speedometers were made, by placing them next to each other.
It was difficult to comprehend the inverse correlation between Mobility Index and Congestion Factor. It was decided to take reverse the sign for the Mobility index meaning higher the value lower the mobility and for congestion factor, higher the value, higher congestion in the city.
Addition of write up for easier comprehension.

Pre-final-02

The double bar charts were again difficult to comprehend. Upon discussing it with classmates, I realised that there were multiple areas of comprehension, which couldn't show a clear distinction while stitching the information together. Also, the infographic had drifted far away from the core element of the data story and have been sourcing information from sources rather inconsistent. One distinct example being the length of roads. Some cities have fairly large lengths of arterial roads that may not contribute to the traffic, hence reducing the road density and portraying a contrasting image from that data point story.

The final iteration came about stacking the two values of Mobility Index and Congestion factor which amounts to the understanding of how congested the roads could be.

Final Infographic-03

Upon validating this against the previous iteration, this gave a much better idea about the congestion and clear distinctions about the conditions of the roads were perceived. It was more comprehensible to find the distinctions between the better and poor performing cities. Though I am unsure about the appropriateness of the coding for the given data. Alas, it works.

Thanks.

shraddhadhodi commented 5 years ago

Datapoint link: How many students in rural districts can perform division?

In as many as 443 of 586 rural districts, less than 50% of students in Grades VI-VIII knew how to carry out basic division, the Annual Status of Education Report, 2018 revealed.

The article gives two visualizations to put forth its point, one with the scatter plot which shows the percentage of students who could read as well as do the math. The table gives the data on the number of districts with poor reading skills and poor maths skills in each state of India.

I believe that with the data that was provided in the data point, it did not do a fair explanation of how only 41% of students living in rural districts could perform division. Also with the data provided in the table, it could be seen that there could be some relationship between the level of maths skills and the reading skills. So in order to understand and visualize the relationship, I tried to plot a bar graph for the values for each state in the decreasing order. (Graph 2)

Interpretations:

Students in Manipur & Mizoram had good learning levels while Meghalaya, Tripura & Assam had the worst.
Students in Bihar fared badly in reading but had better maths skills.
This trend was also seen to an extent in Tamil Nadu Haryana, Punjab and Himachal were among the best in both aspects. J&K, M.P., and W.B. were among the worst.
In Punjab, none of the districts had worse reading levels or maths skills than the national average.

Problems faced: The data is inadequate to ground the claim. Even after looking for the data related to this particular data point, I could not find anything that could help to create better data visualization.

Note: In Retrospect, as per the feedback was given to me, I strongly believe that I did not do a good job to show the intended relationship between the poor reading skills and the maths skills of the districts of India, and I should rework on this assignment to find a better story and relationship between the data provided.

AshmiK commented 5 years ago

Here is the original article.

How many women MLAs in your State?

Screenshot (125)

The idea was to simplify what seems to be a very confusing scatter plot of percentage of women MLAs before 2000 and after 2000. References were made in the article about the increase and decrease of women MLAs in some states. However inferring and comparing information was difficult in this plot. So I decided to represent this plot as a bar graph which allowed the ease of finding a state and comparing it with other states. dataPointBarChart

During the feedback session I realized that I had no idea why the percentages were chunked as pre and post 2000. I tried finding if any major changes had happened around that time (such as election seat quotas, something related to constituencies) but realized that none had happened in India. This is just a representation of a table without a strong story . Will try to work on another story.

rohanjhunja commented 5 years ago

State of Migration

https://www.thehindu.com/data/india-migration-patterns-2011-census/article28620772.ece https://www.thehindu.com/news/national/reasons-for-migration-india/article28772050.ece

The Datapoint article on migration used visualizations to indicate three things:

The reasons for migration out of which marriage was the highest at 46%, 97% of these being women.
The inflow into states that host the highest number of migrants (Maharashtra and Delhi)
Two visualisations that showed the States which host the highest number of migrants, and the States from where the highest number of migrants originate.

States which host the highest number of migrants

For the assignment I chose to work on the third set of visualisations as I felt that the data could tell more about the paths taken by people. I had to first retrieve the original census 2011 data from its website. http://censusindia.gov.in/2011census/d-series/D-2/DS-2800-D02SC-MDDS.XLSX This included granular responses from each state, on gender, reason for migration, time of migration and state of origin. I chose to add year of migration data to the data used by the Datapoint visualisation. As all the data was in counts, it had to be converted to percentages. Population data did not make sense as numbers as larger states would logically hold a higher number of migrants. Data on the population of states from the 2011 census was used to find what percent of the states population had migrated in and how recently had they done so. I was unable to find data on NCR that reflected the Datapoint article indicating that a significant population migrates to Delhi.

The data visualisation I made represents three sets of data:

The composition of a states population (local vs migrant) : heights of coloured parts of the column.
Composition of migrant population (<10 years vs >10 years): heights of coloured part of the column
The size of a states population: width of the column

As population sizes of states vary a lot, I chose to work with the data of the 15 largest migrant populations by state. The states are arranged by decreasing size of migrant population. Blue represents the local population while red from above represents the influx of migrant population. A dashed line represents the national average of migrant population.

Version 1: v1 2-100 From this visualisation it is easy to tell that states like Maharashtra and Kerala host more migrants per capita. It also became apparent that some states like UP and Bihar have a lower than average number of migrants. These happen to correspond with the states that have a large population migrating out. States were not arranged according to geographic location and as a result there is no hint where the migrants to a state are coming from. Feedback suggested that the colours were making it difficult to tell the composition of the population. The placement of the migrant population from the top was counter-intuitive to some as it is expected that columns are read as heights starting from the bottom. Some people were able to visually relate the heading of the article with the visualisation of migrant population placed in proximity.

The second iteration of the visualisation was made to quickly incorporate feedback on the intuitiveness of the graph. The layout of the article was changed to portrait to maintain proximity between the heading and the visuals in the graph. Colours were changed to highlight only the migrant population in shades of red.

Version 2: v3 2@2x-100

A final attempt was made to include data on all states. States were split into two sets based 50th percentile of migrants when states were arranged in ascending order. The first visualisation keeps column widths true to state population sizes. The second introduces a minimum width to make the columns of smaller states readable. The third removes population size as a dataset, having uniform width across columns. This makes it easier to compare state compositions. Version 3:

This graph makes it clearer that some of the smaller states like Goa and Chandigarh host more migrants per capita than larger states like Maharashtra.

seemskk commented 4 years ago

Why do Indians migrate?

The data point article I picked up for designing point to Reasons why men and women migrate in India? The data point is presented as a table which list out the percentage of men and women migrating for work, marriage and education.

hindu data point

Critique of the original visualisation

The immediate parameter that gets your attention is the percentage of women going out for education which <1 for all the states. Rest of the data needs more attention to comprehend.
The heading is generic while the data specifically point to gender disparity in migration pattern, which could have been clearer in the title.
The chart tries to draw some insights through a frame and blocks of colour, which has not been very effective.

Data viz direction

The first attempt was to 'make sense' of the data using a plotting technique. A connected dot plot seemed appropriate for 1 ~ shows connection between the genders, 2 ~ distance they travelled, 3 ~ pattern recognition is easier.

Migration for work
Migration for marriage
Migration for education

Final layout

layout_hindu-01-01-01

Some advantages of this visualisation

The layout has insights marked as 1,2 and 3 in three different colours.
The data viz gives a direct comparison of gender disparity in each segment visually.
The original article had 4 insights. But in this visualisation, I have taken into account only extreme cases and anomalies such as Meghalaya for male marriage migration and Manipur for female work migration has been marked in red rectangles. It eventually leaves out the rest to make sense of the mid range.

Untitled-2

SaiAnjan commented 4 years ago

Progress report on ganga cleaning mission

As part of the assignment I picked up this article. The article is a progress report of a government-commissioned independent study of 97 towns along the Ganga shows that 39% of these towns in five States are in need of overall improvement in cleanliness, solid waste management and a change in how nullahs (drains) are handled.

The article talks about the division of towns along the ghats of river ganga. The article contains 2 parts which talks about grading of these towns among the five states: Uttarakhand, Uttar Pradesh, Bihar, Jharkand, West Bengal. These states share the largest part of the river. Among these the state of Uttar Pradesh is the most populated state in India. And this article reveals interesting data about the amount of towns which needs cleanliness management. Jharkand is a state which shares less area of the river.

1st part - Grading of towns

The first part talks about how many cities are graded either A or B or C. These grades define about how much cleanliness is required for the percentage of the towns mentioned.

Grade A: Good cleanliness and solid waste management services in and around the ghats area. Most nullahs were connected to STP or had garbage screens
Grade B: Partial cleanliness around the ghats. Needs improvement in solid waste management services
Grade C: Needs overall improvement in cleanliness, solid waste management services and infrastructure set up of nullahs

Following is the bar graph used to visualise this data:

2nd part - Different types of river dumps

The second part of the article talks about the river dumps by visualising the number of nullahs(drains) that flow into the river among the same states.

It is common for nullahs to drain into the Ganga across towns in all the States. In Bihar, the towns had dumpsites along the river as well.

The below visualisation shows the percentage of towns in each State that:

Had nullahs draining into them
Had solid waste floating on the surface
Had dumpsites along the ghats.

How I re-visualised it

So after reading the article the 1st part which explains the percentage of towns spread along the river banks is explained in the form of bar charts with percentages. I thought it's better to give the readers an idea of what is the size of each state. Both the parts in the article are visualised linearly while they both have a connected explanation. The state of Jharkand shares less area of the river water and it had 100% of Grade B(Partial cleanliness around the ghats) towns. The same state had no towns with solid waste floating on the surface. So my first approach included a visualisation which had the geographical illustration of the river and the states and the percentages in line graphs near the states itself.

version1

Feedback after discussing with the class:

Showing the 2 parts side by side is not correctly visualising my approach of connection between towns and nullahs
Population is a important factor for the river water pollution and that could have made a difference when visualised along with the line graphs

Version 2

After the discussion with the class I noticed that I'm using less canvas area for the visualisation. So I re thought how I can show the sheer area of the states and the effect of population on the pollution.

A4 Copy 5