info-design-lab / DE705-Interactive-Data-Visualization

Documentation of the IDC M.Des course Interactive Data Visualization, 3-20 Sep 2019

2 stars 0 forks source link

Data Visualization Tools (2020) #5

Closed venkatrajam closed 3 years ago

venkatrajam commented 4 years ago

In this assignment each of you will select one of the following 18 tools, explore it as thoroughly as you can (download, install, tryout, and use it to create something), and do a demonstration/overview of the tool to the rest of the class (20 minutes).

The objective is to introduce the tool to the class, and highlight its possibilities & limitations so the audience can make a well informed choice of available tools. We will do 4 tools per day starting from Thursday onwards. Advait will coordinate and assign the tools. This is a credited assignment.

In your documentation, include links to the resources you used (if any) in your presentations, capturing your personal insights about the tool and related resources.

Observable -- This is a good tool to build interactive exploration and visualization to quickly answer questions and nurture understanding from data.
Chart.js -- Simple, flexible JavaScript charts.
RAW Graphs -- Copy/paste the relevant data directly from your spreadsheet program into RAW, choose a data visualization type, and set your parameters using a drag-and-drop interface. Each individual parameter or visual metric can be adjusted, and the interface is clean and intuitive, making it ideal for beginners.
dygraphs -- Another fast, flexible open source JavaScript charting library.
Palladio -- A good tool to visualize complex historical data which are often qualitative and incomplete in nature, and to build better understanding of the historical material through humanistic inquiry.
Timeline.js -- This is a powerful free tool developed by Northwestern University’s Knight Lab that helps you create engaging, timeline-based visuals to show off your data. Requires no coding. The Jayalalitha Disproportionate Assets Case by one of my students was done using Timeline.js.
Circos -- Circos visualizes data in a circular layout which makes it ideal for exploring relationships between objects or positions. .
Candela -- Open source suite of interoperable web visualization components..
Datawrapper -- Powerful tool that requires no coding.
Leaflet -- Lightweight open-source JavaScript library for mobile-friendly interactive maps.
Google Data Studio -- Free tool from Google and easy to set up if you have a Gmail account.
Tangle -- Tangle is a JavaScript library for creating reactive documents. Your readers can interactively explore possibilities, play with parameters, and see the document update immediately. Excellent for instruction design and process explanation projects.
P5.js -- P5.js is a JavaScript library that sits on top of the Processing visual programming language. As with most JavaSript libraries, Processing.js is web oriented and lets you bring the Processing power to your web pages.
Plotly -- Good suite of visual analytics products.
Gephi -- Open source tool for link an network data analysis and visualization.
OpenRefine -- Formerly Google Refine, OpenRefine is a powerful tool for working with messy data -- cleaning it; transforming it from one format into another; and extending it with web services and external data.
Orange -- Open source machine learning and data visualization for novice and expert. Interactive data analysis workflows with a large toolbox.
R Shiny -- Shiny is an R package that makes it easy to build interactive web apps straight from R. You can host standalone apps on a webpage or embed them in R Markdown documents or build dashboards. You can also extend your Shiny apps with CSS themes, html widgets, and JavaScript actions.

There are more tools here and here. If you want to pick a tool that is not listed above, discuss with me.

For how to document your work, take a look at what the previous batch did with this assignment.

richavagrawal commented 4 years ago

Tool Selected - Timeline.js

Introduction -

TimelineJS is an open-source tool that enables anyone to build visually rich, interactive timelines. It enables storytelling through timelines. Beginners can create a timeline using nothing more than a Google spreadsheet. Experts can use their JSON skills to create custom installations, while keeping TimelineJS's core functionality. The timeline can be published directly or embedded in one’s website.

Some examples -

Revolutionary User Interfaces The Republican Run Up Whitney Houston

How Wine Colonised The World How ISIS expanded in one year from 2 countries to 10

How to use -

4 Simple Steps Step 1: Create a Google Spreadsheet and enter the details. Step 2: Publish it to the web. Step 3: Copy and paste the URL to the specified location on Timeline.js. Step 4: Share the link directly or embed it on your website.

Sample Data -

Narrating the story of design projects done by me over 4 years. Here is the link to my class presentation.

Some Tips -

Do not leave a row blank. The visualisation will not work. List of media that will be supported for the publication. Plus Youtube (it’s not there in the list). The easiest way to publish personal pictures/photographs taken by yourself as a part of the visualisation would be to upload them to your Google Drive (opinion) and then upload the link. Enter year in negative example: -100 to denote 100 BCE. Read the FAQs - They cover everything. The visualisation you publish will be publically available to everyone (so beware of what media you are publishing). Do not keep more than 15 events as one does not realise when it will end (opinion). Do not keep narratives which require a lot of back and forth and it is not very convenient to keep switching (my opinion).

Overall -

Timeline.js serves as an objective, storytelling tool which allows one to include different types of media. The best part about the tool is that it works on a Google spreadsheet which when updated, directly updates the visualisation as well. It is well documented and easy to learn and use to create communicative interactive timelines. From a design perspective, it provides some flexibility to edit the background colour and choice of font pairs with the usage of Google Sheets. Experts in coding can make more changes to the design attributes. The inclusion of details such as allowing one to create paragraphs with basic html tags and grouping events to create categories in the Google Sheet itself made it accessible to use. I think the tool is very powerful and useful in communicating stories online taking into consideration the different types of media that build to a story in the digital world (eg. social media, online videos, audios etc) especially news which consists of chronological events, and lifetimes of people.

Limitations -

The timeline can only be done horizontally. Visual design parameters are not flexible enough. No separate dashboard for the manipulating design parameters, they are distributed at different parts of the tool. No complete control over the units of the timeline.

Additional Information -

The timeline tool has been now developed to allow one to choose a particular even on a timeline and skip to it. One can also zoom in and zoom out the scale of the timeline. Knight labs has also developed similar interesting storytelling tools which you can check out here.

raaghavishan commented 4 years ago

orange_logo_hq

About the Tool Orange is an open-source component-based visual programming software package used for data visualization, machine learning, data mining, and data analysis. This is a very powerful tool and it is like the one-stop solution for pre-processing the data, visualizing the dataset using graphs, all inbuilt machine learning algorithms, test and score features for measuring the accuracy of the algorithm on different datasets.

Components in Orange are called Widgets and visual programming is implemented through an interface were widgets are connected to form the workflows.

Pros:

It is can be easily used by both novice and expert users.
It is an Open-source software package
Visual programming
Does not require coding skills but python scripting can be used for data mining and analysis.
Add ons are available for extending the functionality
Extract dataset from a graph

Cons:

Cannot export the visualizations as an image (JPG, PNG, SVG). Can only be exported as a report which is in PDF form or Saves it in an Html file.
Not enormously robust for working with large datasets. Hence best suited for smaller projects, pedagogical purposes, or exploratory data analysis.
Not aesthetically pleasing

Installation Guide

The orange tool is well documented in their website. They had provided examples and workflows and tutorials to work with the tool. On an overall note, the tool is really fast and powerful. It's easy to learn and very helpful for rapid visualizations. Tutorials for all the features are clearly provided in their YouTube page

References Here are the links to the tutorials and examples that I went through

Link to my presentation

sugandha-123 commented 4 years ago

About

Leaflet is an open-source JavaScript library for creating interactive maps.
Leaflet allows you to make custom made geo-markers, pop-ups, overlays, everything one can imagine via code.
It is super lightweight (39 kb of JavaScript).
Leaflet API library is well documented.
It is supported by lots of plugins.
One can contribute to its code.

Get Started

Download the JS and CSS files of Leaflet.
Follow the tutorials.
Utilise the documentation, and plugins in order to create and extend needed applications.

Some of the features

Works across desktop and mobile platforms.
As it supports GeoJSON, one may create overlays for areas, make choropleths, etc.
Leaflet works with the ‘longitude’, ‘latitude’, and ‘zoom levels'.
One can create maps that are non-geographical, like a fictional map (for a story, or a game.)
One can also view an embedded video on a webpage via Leaflet code.
One can create static or interactive maps, or a map with any other customisation depending on the requirement.

About map tiles

Tiles are vector images of 256x256 pixels, that represent the map.
The outermost zoom level is 0, the entire map is rendered on a single tile.
Each zoom level doubles in both height and width, so a single tile is replaced by 4 tiles when zooming in.
Zoom level can go up to 20, or even more.
The library supports multiple providers of map tiles.
For example, Open Street Maps are free, Mapbox requires a user account and access token key, and is partially free, etc.

Some examples

Other options for map APIs

Google Maps API
ArcGIS API for Javascript
OpenLayers
OS OpenSpace API

AkhilGuthula commented 4 years ago

Palladio

Palladio is a product of Humanities + Design labs of Stanford University. As it is a relatively new tool and still under active development, you may find some bugs while using it. The following are the main features of Palladio:

Maps

You can plot any coordinates data as points on the map
Relationships between distinct points can be connected by lines or an arc representing the flow
These points can be color coded or scaled up and down to represent the magnitude.
Various filters or layers (satellite view, street view, etc) are available to enhance or detail out the map visualization
Can be exported as .svg

Graphs

You can visualize the relationship b/w any 2 dimensions of your data
Graph information will be displayed as nodes connected by lines
Nodes can be scaled to reflect their relative magnitude
Labels can be toggled on and off
Can be exported as .svg

List view

It’s just like another spreadsheet, dimensions of the data can be arranged to make customized lists.
Can be exported as .csv

Gallery view

Data can be displayed within a grid setting
Here the dimensions of your data can be linked to outside website information
Can also sort your data according to different dimensions

Apart from these four main features, there are three more filters which are very useful to analyse data in the above mentioned visualization.

Timespan: Used to analyse how a specific dimension has changed over a period of time.
Facets: Visualizations can be made and analysed by selecting values of a specific or multiple dimensions
Timeline: Used to visualise and analyse how time dependent attributes can be varied over time.

Overall, I realized Palladio can be a great tool to analyse large datasets having many attributes especially in relationship with location and time. But may not be an efficient tool to present the data aesthetically.

Click here to access my presentation

divoojilly commented 4 years ago

Google Data Studio

1_Em45TVUTW429SgOPUDy2YA

Introduction Google Data Studio is a free tool offered by Google that turns your data into informative, easy to read, easy to share, and fully customizable dashboards and reports.

How to use it

Create a Google Account, if you don't have one already
Visit Google Data Studio and Sign-In
Choose a template
Connect data sources (supports .csv, .xlsx, Google Sheets and 220+ other sources)
Choose and edit charts/Add text
Share/Download reports

Features

Part of the Google Suite (needs a Google Account)
Import data from 3rd party data source providers as well as Google data sources (Google Sheets, Google Analytics etc.)
Collaborate, add people and share easily (just like any other Google app)
Choose from templates for presenting reports
Makes data interactive + can add different types of charts
Create dashboards for analytics easily

Some Examples You can find them as soon as your open Google Data Studio.

Pros

Free tool, good for start-ups
Help you import data from 220+ commonly used sources of data
Create charts easily for academic projects
Pre-made templates reduce efforts in designing page layouts
Intuitive to use
Easy to share, easy to collaborate with colleagues
Interactive visualizations with basic amount of interactivity
Beta-test various new tools
Works on the Cloud, no need to worry about Auto-Save

Cons

Not a Business Intelligence tool like Tableau, features are limited
To format a chart, you have to format the dataset at its source (not directly editable)
Cannot be downloaded as an SVG/JPEG, only as a PDF
Many features are under Beta-testing
It is primarily an ad-analytics tool (good for businesses)

Tips and tricks

You can find a template called "Tutorial Report" that would help you onboard yourself to the tool. It's fairly straightforward to use.

Capture

Try out some of the Beta tools, they are nascent but look promising.
If you save a file as PDF, you can always go to Adobe Illustrator, convert it into an SVG file and make it directly editable.
Using Google tools (such as Google Sheets) works better than importing files. In order to make any changes to your file on Data Studio, you have to edit the source file (for eg., if Google Sheet is the source file, you'll have to make changes in the Google Sheet and "Refresh" the Data Studio page to see the changes)
Some Beta tools are not available (although they show up in the toolkit), so don't get frustrated trying to understand what is wrong.

My presentation + video recording Would be added after the presentation.

A sample that I created on Google Data Studio Would be added after the presentation.

nishitanirmal commented 4 years ago

chartjs-tutsplus

Chart.js

What is Chart.js? Chart.js is a javascript library for building flexible charts using the HTML5 canvas element. It is a community-built open source library, with around 99 contributors so far. Available under MIT license, Chart.js was started in 2013.

What does it do? Chart.js has a bunch of different chart types:

Bar
Line
Area
Pie
Radar
Doughnut
Scatterplot

Some Samples can be found on the Chart.js website.

Prerequisites Very basic understanding of object-oriented programming, or some idea of how Javascript works. There are some tutorials and good documentation as well.

How it works In the case of very few data-points, data can be mapped manually. Larger datasets can be input via: .csv files Xcel files JSON APIs

A simple chart I made in Chart.js: pet owners

Cons

Many tools like tableau that automate the process, Chart.js requires brute force
No GUI
Uses Pixels and not vector images
No proper documentation on connecting external datasets
To create new chart types you might require good Js skills

Pros

Easy to learn. Much easier than P5 and D3
Animations are predefined, don’t have to code
Can handle large datasets
Kind of like ‘drawing’ digitally. Increased control and flexibility
Many plugins, extensions, and integrations
Easy to follow documentation
Lots of possibilities if you know JS

tdeepikatiwari commented 4 years ago

PLOTLY

Screenshot (667)

About the company Plotly is a technical computing company headquartered in Montreal, Quebec. Company offers a suit of analytics products.

Dash- framework for building ML and Data science apps
Consulting and Training - helps people build ROI Dash apps and operationalize AI initiatives
Chart Studio

About Chart Studio Chart studio is one of the fastest way to create interactive charts online. It is a web based tools with a a library of visualization templates which can be used for data visualization. It is an open source platform because of which any work done on free account is kept in public.

Screenshot (646)

Motivation behind Plotly

Help people share data in meaningful way
Enable people to explore data
Provide models to enable self exploration

Broad features of Plotly Chart Studio

No Code Visualization 2.Collaboration for teams and enterprises
External Integrations with analytics software like Dash, IBM Watson, Google Analytics etc

Pricing Screenshot (657)

Types of charts you can create on Plotly Chart Studio

Screenshot (647) The image above shows all possible visualizations on Plotly. It is a fairly powerful tool with more chart types than competitors like Datawrapper or RawGraphics. 3D visualizations are also provided.

Screenshot (655) Each chart type comes with 3 default options-

Examples of the chart - collection of visualizations created by other Plotly users. These can be opened in either editor or viewer mode
Tutorials - step by step documentations of how to use the given chart type
Basic example - pre-fetched data visualized that can be used by user to explore it more.

Plotly Pros

User friendly dashboard, free with no set up
Wider range of presets compared to competitors like Datawrapper/ tableau
Higher customization options for labels and design of charts. Color palettes are easier to customize than Datawrapper
Dummy data provided for exploration of each example. Reduces reliance on tutorials, one can simply see the data and understand how the chart type works
Direct tutorial links. Extensive step by step documentation for how to use each chart type
3D visualizations available

Plotly Cons

Limited options on customization of data. It's better to prepare data files separately.
Maps rely on latitude and longitude values, do not fetch location by country name/ state name/ city name
Does not remember design customization you did in a trace type. You need to set it for every new chart separately, not way to batch process

Caution while using Plotly

You can loose progress if you switch to a chart example in between a visualization
Map labels (x, y, z) need to be edited separately. Plotly can not automatically detect type of data. Title for a column will be taken as data, not title.
Make sure progress is saved for retaining dashboard. Plotly does not automatically save your file
Files created on free version are available fro public view
Free account projects are saved publicly

arinjitdas commented 4 years ago

Datawrapper

About Datawrapper

Datawrapper is a simple yet powerful, non-coding web-based data visualization tool that can be used to create simple charts, maps and tables. Data visualizations created with Datawrapper can be either be embedded as interactive data viz. artifacts in your website or content management system, or be downloaded as static visualizations to be used in publications, documents or further refinement through tools such as Illustrator or Photoshop.

It also boasts of certain community features in the form of The River, which Datawrapper's publicly-available collection of visualizations created by Datawrapper's users that may be used for inspiration or a starting point for your own work.

How to Use Datawrapper

Datawrapper requires no coding skills and can be used right away by uploading Excel or CSV sheets, linking shareable Google Sheets documents and simply copy/pasting data from those tools directly into an available text field that parses and detects the kind of information (such as labels, strings and numbers) that has been inputted.

Charts

Datawrapper allows users to create from a number of chart presets based on the data, though certain chart types such as spider charts are not available in the current offering of the tool, especially compared to other non-coding tools such as Plotly, Raw Graphs or Tableau. Users must choose the appropriate chart type for their data when generating the chart since the tool, for some reason, allows one to select chart types that would actually not work for the nature of data provided (such as a pie chart for data of India and USA's democracy indices through the years).

Another minor issue that may crop up is that the tool may confuse between the dependent and independent variables when switching between chart types (from Line to Stacked Bar, for example). This can be solved by going back to the 'Check and Describe [Data]' step and switching the rows and columns.

mYRm1-civil-liberties

Chart Types

Maps

Maps in Datawrapper are pretty standard, allowing users to create maps amongst Chloropleths, Symbol Maps and Locator Maps. What's great about making Datawrapper, however, is that while you can use your own custom maps, they already have a great selection of pre-existing maps, even something as granular as electoral constituencies and revenue circles of the state of Assam.

Again, users can either upload Excel or CSV sheets, share Google Sheets or simply copy/paste data for corresponding map ID data (such as for states when plotting literacy rates of states). But users must be careful to make sure that labels in their sheets correspond to the Label IDs provided in the map. For example, when choosing a 2020 map of India post-Article 370, users' should include Ladakh as a state.

2bUxq-india-literacy-rates-by-states

Tables

The table creation feature in Datawrapper is quite self-explanatory. Datawrapper allows you to generate tables based on your data, but this may not have much value for designers who can achieve a higher level of customizability in a tool such as Illustrator or InDesign which already allow creation of tables. Something as basic as font is not customizable.

But the table creation options may be useful to those looking to generate neater tables than are possible with Excel or Google Sheets and leverage the use of the 'Search in Table' feature and quick customizations such as making tables striped.

UOW1n-india-s-democracy-index-performance-since-2008

Pros of Datawrapper

Useful for quickly creating graphs, tables and charts from raw data
Ability to copy and paste data is neat
Graphs can exported as PNGs or embedded in websites/Content Management Systems
Customizing charts is easy and intuitive
Colorblind Check is a small but important feature
Map selection is vast

Cons of Datawrapper

Slightly limited capabilities. E.g. No way to create spider charts
Limited Interactivity (compared to Tableau or other coding tools)
Cannot export to SVG or PDF unless on Custom or Enterprise plan
Can’t always convert from one chart type to another but the tool lets you do it anyway.
Generating maps was problematic

Final Thoughts

Useful for quickly creating graphs, tables and charts from raw data
Good idea to know what kind of data you are charting to pick suitable chart
Have to convert the PNG into an SVG or use the PNG as an underlay in a tool like Illustrator
Great tool if you want simple charts for websites
The River is a good discovery tool for people who may have done similar visualizations

Example Visualizations Created with Datawrapper

Total Backlogged Cases in U.S. Immigration Courts Hits Historic Highs Number of Universities and Colleges Statewise in India Tracking Covid-19 Hotspots in Iowa

Link to the Presentation

Explorations of Datawrapper - Google Sheets

rajsreekanth commented 4 years ago

Circos

circos-logo-tableviewer

Circos is a popular, highly flexible, open-source software package for the circular visualization of complex datasets, created by Martin Krzywinski. Though it is popular in the field of genomic analysis, Circos enables graphing of any analytical data. Circos is controlled by plain-text configuration files, which makes it highly customizable and can be automated. Another important aspect of Circos is that whatever you create with it will be aesthetically pleasing. It uses a circular composition to show connections between objects or between positions, which are difficult to visually organize when the underlying layout is linear or a graph.

Since Circos is controlled by plain-text configuration files, it doesn't have any interactive user interface. Which makes it difficult to use for those who are not trained in programming or the UNIX command line. I spent a lot of time trying to understand the complex system to get it installed on my machine. Luckily I found Circos Online that allows less flexibility and customization options, but can still be used for visualizing simple tables. The Maximum row + column total you can use in the online tool is 150 and if exceeded, rows and columns are limited to 75.

How to use Circos Online

I used a small dataset to try the tool for the first time. It was the sheet we used to mark the tools we picked and the days we are presenting it.

For this to work in Circos I converted the data to the following format:

This table was saved as a Tab Separated Value (.tsv) format. Once it got uploaded I clicked on the Visualise button and after a few seconds this is what I got:

circos-table-zkmossc

I was able to download the generated visualization in multiple formats (large image or a compressed folder with data, images (PNG/SVG), and configuration).

Then I tried the same thing with a bigger dataset. I took the one containing the sta-wise number of custodial deaths between 2001 and 2012. Make sure you don't have any blank cell in your table, otherwise the tool will give you a scary-looking error page. You can use hyphens to fill the empty cells.

Let's see what I got

raaghavlaxman commented 4 years ago

Tool selected : RAWGraphs

Introduction RAW Graphs is an open-source data visualization framework built on D3.js. It is a tool that aims to bridge the gap between spreadsheets and vector graphics editors. The project is led and maintained by the DensityDesign Research Lab (Politecnico di Milano) and was released publicly in 2013.

Types of charts available :

Contour plot
Convex hull
Hexogonal binning
Scatter plot
Voronoi tessellation
Beeswarm plot
Box plot
Circular dendogram
Cluster dendogram
Circle packing
Sunburst
Treemap
Alluvial diagram
Bar chart
Pie chart
Parallel coordinates
Gantt chart
Bump chart
Area graph
Horizon graph
Stream graph

Step 1 : Enter your data from a csv or spreadsheet. guide for stacking data for RAWGraphs here.

Step 2 : Choose a chart

Step 3 : Map dimensions, drag and drop to respective fields based on whether they are strings, numbers or dates

Step 4 : Customise your chart and download as png / svg

Advantages:

Easy to use, drag and drop interface. No coding required.
Charts can be downloaded in svg format enabling customisation in Vector editing software.
Can be used to create quick visualisation which can be polished in Vector editing software. -Sample datasets available within the tool are a good reference for formatting your own data effectively and testing which chart would work best for your dataset.
Can add custom charts on the interface if you can code them.

Limitations :

Static visualisations
Limited number of chart types
Not very customisable

Examples Some featured examples created using RAWGraphs.

Learning An extensive guide for how to use each of the available charts here.

jon-swn commented 4 years ago

OpenRefine 💎

An very simple yet powerful tool used to clean up and structure your dataset

OpenRefine (previously Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.

Messy data is usually data that has been fed in manually, this type of data will have a lot of typos, spelling inconsistencies, and different formatting for the same data. We usually find this especially in Indian datasets where the same is spelled differently in different locations.

OpenRefine always keeps your data private on your own computer until YOU want to share or collaborate. Your private data never leaves your computer unless you want it to. (It works by running a small server on your computer and you use your web browser to interact with it). There is extensive documentation that can be found on OpenRefine and what I mostly did and I suggest is to follow the foundation course from the documentation page.

Getting started with Open Refine The interface is fairly simple once you load in the data, sometimes you will have to do some adjustments, on OpenRefine itself before starting the project, when you have to ignore 1st few rows etc. After loading the data you can see it resembles a spreadsheet, but here you are working mainly with columns and not cells. The main difference is that OpenRefine helps more with bulk editing.

OpenRefine Github-01 figure 1.

In figure 1 you can see fig1.(1) OpenRefine uses filters or facets to sort your data, fig1.(2) you can change the view, and also once you sort the data you can see the changes on the fig1.(1) Uno/Redo tab. You can see the history of changes made and can go all the way back to the first step. Whatever changes you make in OpenRefine won't reflect on the original file

OpenRefine Github-02 figure 2.

OpenRefine recognizes different types of data, sometimes it's encoded in the dataset, other times the user has to do that manually. Here fig2.(1) under the heading CURRENT_USE I have used the Text face to show all the different kinds of data under that column and this list appears in fig2 (2). This is an example of messy data where Apartment Building has been written in so many ways. The feature called cluster fig2(1) helps to fix this issue to help rename these inconsistencies and also delete extra spaces in the cells.

OpenRefine Github-04 figure 3.

An algorithm fig3.(1) is used here to group data that Refine detects and if the user thinks it makes sense to group that data, "fingerprint" is the one with the strictest threshold and as you select the others it becomes more lenient. The user can check the categories they want to merge fig3(2) and also name the category as per their choice then select merge & Re-Cluster to save changes. By the time the user uses the 3rd threshold all the categories in fig3.(3) will become one.

Refine has its own expression language to manipulate data using programming as well, it called GREL( Google Refine Expression Language) but I did not get a chance to explore its possibilities that much, other than the usual splitting and combining columns. There are a lot of comparisons between OpenRefine and Excel or Google Sheets, but where Refine is better at excel is batch editing and working with inconsistent data. This has been just a very basic overview of the tool and one can learn a lot more by just going through the tutorial that was mentioned earlier for a more comprehensive guide.

Other Resources to check out

Noopurkumarikashyap commented 4 years ago

images

p5.js.org

Introduction to the tool:

P5.js is a JS library that was basically designed for creative coding.
It was created by Lauren McCarthy and is currently led by Moira Turner.
It was developed to make coding of interactive graphical applications easier as compared to other computer graphics software like OpenGL.
It is based on processing but significantly different from it.
It is used by designers, artists, and coders.

P5 and Processing are not the same!

Processing and P5 looks a lot similar and used interchangeably. So, its important to understand the difference between the two.

Both provide almost similar features. They both have some similar names of the predefined functions.
Both can work with other software like Arduino, wiring, openCV, etc.
Processing is built on Java and P5.js is a Javascript library.
What makes processing similar to P5 is processing.js which is different from processing.pde. Processing.js can execute processing.pde files in html5 and uses regex to convert the Java code to Javascript. On the other hand, p5.js is totally native to JS.

Advantages of using P5

Javascript is easy to learn and does not require any special interpreter as it can run on any browser that supports JS.
The code written with P5.js is generally concise but not limited in its capabilities.
It can work with various Input files( txt, sound, CSV, XML, pdfs, webpages, JSON and API ).
It is based on Object-oriented programming.
It handles read and write of data from files very efficiently.
It has a very well documented Library + sample examples.
It has a simple predefined code structure.
It can provide 3D support also.
It can allow working with real-time non-alphanumeric data.
Its is great to create data art.

P5 is used for

Creative coding
Computer Vision
Working with Data

Code structure

preload( )- The preload function is used to load necessary data before setting up the program. Loading of images, fonts, and other data happens in this function. It is called only once before the setup function.
setup( )- This function is used to define the initial environment like defining the screen size, background-color. It is called only once before the draw() function.
draw( ) - This function is called again and again continuously in a loop. And it generally renders a new content on the screen each time it is called.
mousePressed( )- This function is called only when the event of mouse press occurs.

P5 and Data Visualisation

Generally, the steps involved in data visualization with P5 includes( these steps are based on my observation)-

Get Data( Authenticate)- Sometimes one has to authenticate themselves to get data from the internet. This generally happens when working with realtime data.
Read- There can be different ways to read the data. P5 handles read very efficiently. When working with real-time data, one can define a query string for the same.
Understand the structure data- To get started with data visualization in P5, one needs to understand the structure of the data and read the data accordingly in the program.
Interprete- It is important to Interprete the structure of the data to code it in the program.
Draw Data- Draw the data on the P5 canvas.
Move Datapoints- Edit position coordinate in the draw() function.

Figure 1 below shows a data visualization of volcanic eruptions over the past thousand years that I created with p5,js. Data source: Kaggle.com

volcanic_eruption Figure 1

Creative Coding + Data visualization

P5 was originally created for creative coding. And when creative coding merges with data visualization, it makes P5 a powerful tool for data art. One example is shown below in Figure 2, Trees of translation, created by Baltazar Pérez. It visualizes human text-writing and translation processes.

Figure 2

Non-alphanumeric data + Data visualization

Figure 3( original article here) shows one such example where the entire movie can serve as an input file. Programs like P5 can then read colors in each frame which can be built into one such visualization. There is no need to create text/numeric data files first. Even if this data is realtime( on inputs of camera frames ), P5 can work at its best.

Figure 3

Limitations

1. Error messages are not clear 2. No SVG( P5 provides raster images as output) 3. To create effective and new data visualization, JS knowledge is important. 4. Understanding the coordinate system and other related mathematics is required. Link to the presentation P5 examples

dikshasingh13 commented 4 years ago

Gephi

A network analysis tool

Download Gephi Their tutorials An interesting introductory blog Slides for presentation

Introduction:

Gephi is an open-source interactive visualization platform
People use it as an exploratory tool to understand Graphs and networks
Gephi essentially gives data-driven explorations
It deals with networks and complex systems which include dynamic and hierarchical graphs
Gephi lets users interact with the representational network- they can manipulate the structure, size and colours of the network to reveal hidden properties
Gephi can import following standard graph file formats. Articles contains documentation, samples and implementation details. They helps outlining differences between formats.

GEXF GDF GML GraphML Pajek NET GraphViz DOT CSV UCINET DL Tulip TPL Netdraw VNA Spreadsheet

Applications

Exploratory Data Analysis: intuition-oriented analysis by network manipulations in real-time.
Link Analysis: revealing the underlying structures of associations between objects.
Social Network Analysis: easy creation of social data connectors to map community organizations and small-world networks.
Biological Network analysis: representing patterns of biological data.
Poster creation: scientific work promotion with hi-quality printable maps.

Goal

The goal of Gephi is to help data analysts to intuitively discover patterns
It is a contemporary tool to traditional statistics.
It uses visual thinking with an interactive interface to recognise and facilitate reasoning
It's used for analyzing big data sets with complex networks

Network

A network looks something like this:

Capture The researchers used Gephi to create this visualization This is an ingredient recommendation system. The software helps recommend a complementary flavour which goes well with the ingredient in your mind. The nodes, like- salt, water, lemon juice are individual ingredients. The line between these ingredients signifies if these go well together.

Similarly, Gephi is also used in social media analysis:

mhawksey-googleplus

The datasets accepted for Gephi look something like this:

To understand more about the networks and different statistical operations provided by Gephi, please follow this blog unnamed

Some examples:

Gephi community on Twitter
Marvel Universe Visualizations

1_ZF9HmpI-khxFg_cugr5RlA

Some more visualizations

Pros and Cons

Gephi was able to handle large datasets (100,000 nodes and 1,000,000 connections)
Highly configurable
Layout options were really effective and a great way to visualize the dataset to get more insights
Statistical operations were very useful as well to play around with the attributes and give more insight to the dataset that we had
Different plugins

But,

There are a lot of bugs that need fixing
The software has trouble downloading and sometimes gets tricky to start and work with
The interface is not as intuitive. The user needs to spend time and get used to the interface and operations.
The limitations of the datasets accepted

Some youtube videos that helped me understand the tool better- 1, 2, 3, 4

rishi4git commented 4 years ago

Data Driven Document (D3)

Developed by Mike Bostock and team in 2011

D3.js is a JavaScript library which you bring data to life using HTML, SVG, and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.

Purpose of Use

Tell Stories the way you have imagined!

ezgif-2-b6f2ccb811e4

Why Choose D3

High Customisation of DOM
Real-time Interactions with Data and Improved accessibility of information
Transitions
Transformations

Learning Resources

D3 Graph gallery

From where can we learn

D3 Documentation
Examples (D3, Github, D3 Graph gallery etc.)
D3 Live Tools
Tutorials and courses

Pros

Highly Interactive
Free and open Source
Extremely fast
Better integration with Website and other Javascripts
Can directly interact with Live system to update the visualisations in real time. Screenshot of data visualization done in an IOT system using D3.

Cons

Requires intermediate/advanced coding skills
High development time
Struggles with data sets in Gigabytes (R and Python to rescue)
Clean and compatible data
Visual design is time taking

The great customization demands writing code for each element present in the visualization. Screenshot of code snippet for the creation of a simple bar chart in D3 using D3.live online D3 platform .

When to use

Data visualization framework of your own or build interactivity upon existing ones
Webpage is interacting with data, easy data binding with real-time system (web-based)
Using data for DOM manipulation.

When can one avoid using

Data Viz framework already exists
Static and less detailed data viz
Using data for DOM manipulation.

Conclusion

Extremely Powerful and Versatile tool for large data points(Complete canvas control)
Interactivity with minute details
Useful for data scientists and Unconventional Visualizations
Do not use when creating static and predefined data visualizations
Tool for added interactivity and better understanding of complex and large data sets.

advaitmb commented 4 years ago

Tangle JS

is a javascript library created by Bret Victor as a part of his 2011 essay - ‘Explorable Explanations’.

Active Reading — exploring, questioning and considering alternatives while reading, unlike traditional documents.
Tangling pieces of text with author defined relationships

Text as Information to be consumed vs Environment to think in

Explorable Explanations introduced 3 frameworks

Reactive Documents --> Tangle.JS
Explorable Examples
Contextual Information

Tangle JS is a library used for creating reactive documents.

Fangle is a markdown implemenation fo Tangle JS that gives a quick look at its capabilities.

Pointers about Reactive Documents

Appear as normal documents to casual readers
Inquisitive readers can drag and change values to explore other scenarios and question the assumptions of the author
Allows interaction but does not provide overview
Suitable for explanations
Can only handle continuous data and in some cases nominal/categorical data
No dependencies
Extremely lightweight and can load on very low internet speeds

Some advanced features

One can also tangle text with charts, graphics and graphs such that when one interacts with text, changes are seen on these charts or graphics. Eg.: Ten Brighter Ideas? An Explorable Explanation
These is almost no documentation when it comes to integrating charts and graphics in tangle. One has to define custom javascript classes which include graphs and use the tangled text as a controller for those graphs.
For creating these visualisations, other libraries can also be used. eg. D3.js

Design Considerations

The design lies in the way the text is written and structured.
Text should make sense in spite of change in values or variables.
Affordances of the interactive text should be communicated.
Custom styling can be done through CSS

Modern Alternatives

Tangle is a very old library and because it's open source few developers have created more modern versions of the same with other features.
iooxa.dev: Interactive Scientific Writing : Full stack interactive scientific writing library