mschermann / data_viz_reader

A Reader on Data Visualization
https://mschermann.github.io/data_viz_reader/
Creative Commons Attribution Share Alike 4.0 International
19 stars 17 forks source link

Describing Risks and Pitfalls in a Data Viz Project #1230

Open Vidya313 opened 5 years ago

Vidya313 commented 5 years ago

We can organize topics like :

Vidya313 commented 5 years ago

The below link provides insights on different reasons behind visualization failure.

Source: https://towardsdatascience.com/6-reasons-why-data-visualisation-projects-fail-1ea7a56d7602

XZou0803 commented 5 years ago

This is very useful! Looking forward to the results!

rtamhankar commented 5 years ago

Below Link provides information about pitfalls in the different stages of visualization. The 5 important pitfalls during the stages like data exploration, drawing insights from the data and actual implementation of the data visualization has been discussed. The list can be enhanced further to include additional pitfalls. Source: https://www.quantics.nl/pitfalls-in-data-visualization/ https://www.slideshare.net/qlik_arg/5-data-visualization-pitfalls

Content:

Data visualization is trending in data science and can help a company thrive. It can convey clear messages to shareholders who are less familiar with the data, like a company’s board. It can lead to valuable insights that help improve customer satisfaction, increase profits and improve processes. However, misinterpreting data can lead to bad decisions. Below mentioned are some of the most common pitfalls in data visualization. Avoiding these pitfalls can help in clearly conveying the right message.

  1. Color Abuse: Color has its place but don’t overdo it in data visualizations. The wrong color can lead to confusion, or even worse, misinterpretation. For example, red is often associated with something negative. Linking the color red to data that is relatively less good than an alternative but not per se bad can cause misinterpretation
  2. Misuse of Pie Charts: We all love our pies. But nothing is less satisfying than a tiny sliver. If you try to squeeze too much information into a pie chart, the big picture gets lost. Too much detail leaves your audience feeling unsatisfied and confused. Avoid using pie charts side by side — it’s an awkward way to compare data.
  3. Visual Clutter: Making discoveries in a cluttered visualization is like finding a needle in a haystack. Too much information defeats the purpose of clarity. And unnecessary elements - or chartjunk crowd a visualization, obscure meaning, and lead to inaccurate conclusions.
  4. Poor Design: Design is not just what it looks like and feels like. Design is how it works. – Steve Jobs. Just because visualization is beautiful to look at doesn’t mean it’s effective. Effective visualizations incorporate design best practices to enhance the communication of data.
  5. Bad Data: Great visualizations start with great data. If your visualization reveals unexpected results, you may be the victim of bad data. Don’t let your visualization become the scapegoat for bad data.
bharatikandakumar commented 5 years ago

Great idea Vidya!! For the Ways to get them right part, I think adding the following content might be helpful.

Data visualization design isn’t about displaying data; it’s about displaying data in a way that makes it easier to comprehend—that’s where the real value lies. Here are some ways we can get them right Source : https://www.columnfivemedia.com/25-tips-to-upgrade-your-data-visualization-design

Design:

  1. Choose the chart that tells the story. There may be more than one way to visualize the data accurately. In this case, consider what you’re trying to achieve, the message you’re communicating, who you’re trying to reach, etc.
  2. Remove anything that doesn’t support the story. No, that doesn’t mean you kill half your data points. But be mindful of things like chart junk, extra copy, unnecessary illustrations, drop shadows, ornamentations, etc. The great thing about data visualization is that design can help do the heavy lifting to enhance and communicate the story. Let it do its job. (Oh, and don’t use 3D charts—they can skew perception of the visualization.)
  3. Design for comprehension. Once you have your visualization created, take a step back and consider what simple elements might be added, tweaked, or removed to make the data easier for the reader to understand. You might add a trend line to a line chart. You might realize you have too many slices in your pie chart (use 6 max). These subtle tweaks make a huge difference.
  4. Comparison - Watch your placement You may have two nice stacked bar charts that are meant to let your reader compare points, but if they’re placed too far apart to “get” the comparison, you’ve already lost.
  5. Don’t use distracting fonts or elements. Sometimes you do need to emphasize a point. If so, only use bold or italic text to emphasize a point—and don’t use them both at the same time.

Color:

  1. Use a single color to represent the same type of data. If you are depicting sales month by month on a bar chart, use a single color. But if you are comparing last year’s sales to this year’s sales in a grouped chart, you should use a different color for each year. You can also use an accent color to highlight a significant data point.
  2. Avoid patterns. Stripes and polka dots sound fun, but they can be incredibly distracting. If you are trying to differentiate, say, on a map, use different saturations of the same color. On that note, only use solid-colored lines (not dashes).
  3. Select colors appropriately. Some colors stand out more than others, giving unnecessary weight to that data. Instead, use a single color with varying shade or a spectrum between two analogous colors to show intensity. Remember to intuitively code color intensity according to values as well.
  4. Don’t use more than 6 colors in a single layout.Enough said.
  5. Make sure there is sufficient contrast between colors. If colors are too similar (light gray vs. light, light gray), it can be hard to tell the difference. Conversely, don’t use high-contrast color combinations such as red/green or blue/yellow.

Labelling:

  1. Double check that everything is labeled. Make sure everything that needs a label has one—and that there are no doubles or typos.
  2. Make sure labels are visible. All labels should be unobstructed and easily identified with the corresponding data point.
  3. Label the lines directly. If possible, include data labels with your data points. This lets readers quickly identify lines and corresponding labels so they don’t have to go hunting for a legend or similar point.
  4. Don’t over label. If the precise value of a data point is important to telling your story, then include data labels to enhance comprehension. If the precise values are not important to telling your story, leave the data labels out.
  5. Don’t set your type at an angle. If your axis labels are too crowded, consider removing every other label on an axis to allow the text to fit comfortably.

Ordering

  1. Order data intuitively. There should be a logical hierarchy. Order categories alphabetically, sequentially, or by value.
  2. Order consistently. The ordering of items in your legend should mimic the order of your chart.
  3. Order evenly. Use natural increments on your axes (0, 5, 10, 15, 20) instead of awkward or uneven increments (0, 3, 5, 16, 50).
Vidya313 commented 5 years ago

Source: https://towardsdatascience.com/6-reasons-why-data-visualisation-projects-fail-1ea7a56d7602 Content : Critical points of failure in data visualization projects.

Regardless of the tremendous promise of data visualization, and the discipline is in focus for years now, it is not fully grown. With the existence of numerous visualization tools at disposal associated with fancy features, impactful use of data visualization is still scarce. At times, one wonders what makes the visual display of information so hard. • Conceivably it's got to do with the reduction of information design to the lame, but slick dashboards displaying an assortment of KPIs. • Possibly it's the fancy charts and dazzling functionalities that have somewhere lost the pulse of the users. • Or, 'it's the well-meaning initiatives that have gotten derailed midway due to conflicting priorities & confused execution.

Vidya313 commented 5 years ago

Some of the failure points in visualization projects are as below: 1) Ignoring End Users: End users are often not directly engaged while defining needs for visualization projects. This is a prime reason why visual dashboards often go unused after rollout. What matters is a mapping of user stories, and hearing how users approach business problems. This is the user's practical wisdom that can't be transferred, and which is closely linked to action ability. It's demanding to onboard the end users and gathers their nuanced business perspectives so that it can be built into dashboards. Build the user persona through interviews, map the user journey by gentle probing, and jointly sketch out the as-is business scenarios. It's also helpful to list the questions that will be answered by the visualization, and clarify on the ones that will not be.

2) Attempt to include all features: The more features dump into an application, the lesser it will get used. While one gets a false sense of satisfaction by checking all boxes, the cognitive load could get so high that users stop using it altogether. When it comes to prioritization, the most knowledgeable users may not have the right perspective to take hard calls or the gumption to bite the bullet.

It's necessary to play a consultative role and help whittle down the feature list to the most critical. While screen space is technically unlimited, its useful to impose constraints on data density’. Onboard stakeholders who know the priority, who can take hard decisions and also champion the many battles needed to convince other users.

3) Overlooking need for data exploration : Retrofitting data is the root cause for ending up with non-actionable dashboards or weird-looking charts. Without exploratory analysis, maps could be skewed by outliers or worse, end up without patterns. Data also drive choice of charts. As part of project planning, its critical to account for data upfront. While getting the header rows is a useful start, full data is essential before crucial decisions of design can be taken. Clients must be educated that data is indeed in the critical path of visualization, and that data insights drive design decisions. 4) Driving self-satisfaction over visualizations: At times individuals develop such a great affection with a chart, that they fatefully try extending this relationship beyond the scope of cool visualization examples. This leads to unproductive force-fitting of charts into the solution. The compromises made for this adjustment can wreak havoc on the entire project. Those who demand exotic or 3D charts even when use cases don’t support it are doing so for their satisfaction, and end up alienating users. The choice of the chart is a science, and there are robust disciplines to adhere to.

5) Endeavoring to Make everything clickable: When designing navigation and interactivity, it’s a common fantasy to make everything clickable. When pushed to prioritize features within a screen, a common excuse is for users to try and hide entire dashboards behind unrelated clicks. Rich UI doesn’t mean many clicks; it means just the right and intuitively placed ones. It may be useful to impose some guidelines, say, no more than eight clicks per screen. Data stories can be equally compelling in the static format, so carefully question the interactivity needed. Users will be thankful for this call.

6) Being opinionated in choice of colors: Everyone has their choice of colors, and they can get pretty extreme. Unfortunately, this can have an essential bearing on the viewer's acceptance. And the color is not just about look-and-feel. It's important to consider users with red-green color blindness.
Color theory is more an art than science, though there are standard guidelines to handle the aesthetic, functional, and social aspects. It's best to go with the user persona and application requirements, rather than trying to please everyone. One must also take the effort to articulate choices and help resolve disconnect since most users are unable to explain their color preferences.

AayushiVAgrawal commented 5 years ago

We can also add the limitations of data visualization tools :- Data visualization tools are required for analysis of data and trends but they have some limitations especially when datasets grow in size.

  1. Lack of explanation- While data visualizations can be generated in real-time, they do not provide any explanations. In fact, the process through which companies draw insight has not changed in the last 30 years. Analysts look at data and then write reports. This process is too slow for the market and too costly for the company. At the same time, data visualization tools expect the user to be an expert in all of the data and all of the corporate best practices.

  2. Deriving different insights by different user- Each user has a different conclusion with the same visualization, based on his/her previous experience. This presents several problems for companies. On the one hand, certain users could be erroneously drawing conclusions which cost the company money and on the other, in highly regulated industries, users’ incorrect conclusions could actually put the company at risk.

  3. Lack of Guidance- It is possible that the user who is interpreting the data lacks training. This can have huge impact on the company. At the same time, analysts could provide clients with incorrect or substandard advice. Even systems with Natural Language Query, expect the user to know what they are looking for. This works with simple data but the industry trend is towards big data, data lakes and complex analysis. It’s so complicated you might not even know what you don’t know, to paraphrase an American Defense secretary. The answer is so simple that its easy to miss.

  4. False sense of security- Graphics are great for conveying simple ideas fast – but sometimes, they are just not enough. Graphics can make users think they are making data driven decisions or think they fully understand the data when in reality they are only seeing a picture but they don’t know the full story.

Source: https://yseop.com/blog/top-4-limitations-of-data-visualization-tools-2/

Psharma2193 commented 5 years ago

Great content, team! These risks are avoidable yet they happen so frequently in projects. This will be a great addition to the reader.

AjayDeshmukh14 commented 5 years ago

I agree with you Prachi, these are some of the most common mistakes that happen in projects. I would like to add to this content those scenarios which may create a dilemma for the data viz creators in a project.

  1. To cut or not to cut (The Y-axis): By its design, a bar graph emphasizes the absolute magnitude of values associated with each category, whereas a line graph emphasizes the change in the dependent variable (usually the y-axis values) as the independent variable (usually the x value) changes.

    • Barplot: With this kind of chart there is consensus: your Y-axis should start at 0
    • Line plot: Here however there is no consensus, even if in general you don’t have to start at 0.
  2. Using Area for the representation of numeric values: The human eye does not perform well when it has to translate areas to numeric values. Thus it is recommended to use bar graphs to represent areas. This does not mean that area must never be used to represent a numeric variable. It means that other shapes and techniques must be before using area. For instance, the bubble chart does a good job representing the values of 3 numeric variables.

  3. Scaling to Radius or Area: When working with 2d objects, the scaling must be done using the area and not the radius. Furthermore, note that areas are a poor metaphor of values, being poorly perceived by human eyes. It must be used only when better visuals have already been used on the graphic (like in bubble plot). A barplot would probably do a better job.

  4. When to use a heatmap: Heatmap is really useful to display a general view of numerical data, not to extract specific data point. Heatmap is also useful to display the result of hierarchical clustering. Basically, clustering checks which set of objects tend to have the same features on their numeric variables.

Source: https://www.data-to-viz.com/caveats.html

Vidya313 commented 5 years ago

I have created a branch from contributions, " DataVizProject_Risks&Pitfalls#1230 " to work on this issue.