vega / altair

Declarative statistical visualization library for Python
https://altair-viz.github.io/
BSD 3-Clause "New" or "Revised" License
9.28k stars 793 forks source link

Graph/network representation in Altair? #340

Closed linclelinkpart5 closed 7 years ago

linclelinkpart5 commented 7 years ago

Hello,

I'm quite new to Vega(-Lite) and Altair, having only heard about them at Pycon 2017. I was wondering if it was possible to represent an interactive graph/network using Altair. My use case involves being able to view, zoom, pan, drag, drop, and edit the nodes and edge properties of a directed acyclic graph.

The Vega examples show something that could be modified to fit my use case (https://vega.github.io/vega/examples/airport-connections), but it seems that Altair specifically deals with Vega-Lite, and thus is mainly statistically-focused?

kanitw commented 7 years ago

it seems that Altair specifically deals with Vega-Lite, and thus is mainly statistically-focused?

Yes, we don't plan to extend Vega-lite to support graph/network in the short term.

Thanks for asking :)

luerhard commented 7 years ago

for interactive networks i can recommend bokeh like here: https://gist.github.com/habet700/daa537362eef802f8d808782fc962bf2

BrennanBarker commented 3 years ago

This doesn't directly address the OP's question, but for people who end up here looking for help visualizing networks in Altair, it's probably useful to point to the nx_altair package, which focuses on providing an API similar to NetworkX's, but with all the goodness of Altair charts' interactivity, etc.

For those interested in leaning into Altair's lovely Grammar of Graphics (GoG)-focused API, below is one way I've gone about it.

A GoG'y way to think about you're doing when you make a typical network visualization (the drawings with dots and lines between them) is you're laying out point marks (representing your nodes) spatially on an X/Y plane and drawing line marks between the nodes where edges exist. While your nodes and edges do have intrinsic properties (names, weights, ranks in a hierarchy, etc.), the trick is to realize that (by definition) your graph doesn't include information about how to position nodes spatially. You have to provide that information, either by specifying node positions arbitrarily or by using a layout algorithm. There are lots of layout algorithms ('forced-directed' being one category of layouts, there are others), and there are great packages that calculate graph layouts, but Altair (I think wisely) leaves it to those other packages rather than trying to replicate that functionality themselves.

Here's an example where I use NetworkX to build a graph and calculate some properties as well as layout information before feeding it into Altair to actually build the graphic:

from itertools import chain
import pandas as pd
import networkx as nx
import altair as alt

# Step 1: Prepare an example graph (this all happens outside of Altair)

# example graph 
r,h = 3,3
G = nx.generators.classic.balanced_tree(r,h)

# calculate rank of a given node and assign it as data
for rank in range(0,h+1):
    nodes_in_rank = nx.descendants_at_distance(G, 0, rank)
    for node in nodes_in_rank: 
        G.nodes[node]['rank'] = rank

# calculate layout positions, for example using Graphviz's 'twopi' algorithm, calculated via networkx's API.  
pos = nx.drawing.nx_agraph.graphviz_layout(G, prog='twopi')

# Step 2: Convert graph data from NetworkX's format to the pandas DataFrames expected by Altair

pos_df = pd.DataFrame.from_records(dict(node_id=k, x=x, y=y) for k,(x,y) in pos.items())
node_df = pd.DataFrame.from_records(dict(data, **{'node_id': n}) for n,data in G.nodes.data())
edge_data = ((dict(d, **{'edge_id':i, 'end':'source', 'node_id':s}),
              dict(d, **{'edge_id':i, 'end':'target', 'node_id':t}))
             for i,(s,t,d) in enumerate(G.edges.data()))
edge_df = pd.DataFrame.from_records(chain.from_iterable(edge_data))

# Step 3:  Use Altair to encode the graph data as marks in a visualization
x,y = alt.X('x:Q', axis=None), alt.Y('y:Q', axis=None)
# use a lookup to tie position data to the other graph data
node_position_lookup = {
    'lookup': 'node_id', 
    'from_': alt.LookupData(data=pos_df, key='node_id', fields=['x', 'y'])
}
nodes = (
    alt.Chart(node_df)
    .mark_circle(size=300, opacity=1)
    .encode(x=x, y=y, color=alt.Color('rank:N', legend=None))
    .transform_lookup(**node_position_lookup)
)
edges = (
    alt.Chart(edge_df)
    .mark_line(color='gray')
    .encode(x=x, y=y, detail='edge_id:N')  # `detail` gives one line per edge
    .transform_lookup(**node_position_lookup)
)
chart = (
    (edges+nodes)
    .properties(width=500, height=500,)
    .configure_view(strokeWidth=0)
)
chart

visualization-6

The nice thing about keeping things in terms of Altair's API is that it's trivial at this point to add additional encodings, layers, interactivity, tooltips, etc.

All this said, there are a couple of things that Altair doesn't quite appear to support yet, although I'd be happy to be corrected:

BradKML commented 2 years ago

Apologies for asking, but the Graph above is not quite a network graph, and I can't seem to find a fitting example. Alternate places of ideas:

BrennanBarker commented 2 years ago

@BrandonKMLee the examples you pointed to all incorporate the same visual elements as the plot above: some marks arranged arbitrarily in X and Y space, with lines between some of them, and optionally some additional encodings for more information, such as node color, or size.

Here's another example of applying the same technique as above, this time using the classic "Karate Club" social network dataset, calculating the nodes X and Y positions using networkx's spring layout function, and doing the data transformation in pandas before passing to altair. I also include a classic measure of network centrality (calculated with networkx), encoded as node color.

import altair as alt
import networkx as nx
import pandas as pd

# networkx for example graph data, and layout/SNA calculations
g = nx.karate_club_graph()
positions = nx.spring_layout(g)
betweenness = nx.centrality.betweenness_centrality(g)

# munging into an Altair-friendly format in pandas
nodes_data = (
    pd.DataFrame
    .from_dict(positions, orient='index')
    .rename(columns={0:'x', 1:'y'})
    .assign(betweenness=lambda df:df.index.map(betweenness))
)
edges_data = (
    nx.to_pandas_edgelist(g)
    .assign(
        x=lambda df:df.source.map(nodes_data['x']),
        y=lambda df:df.source.map(nodes_data['y']),
        x2=lambda df:df.target.map(nodes_data['x']),
        y2=lambda df:df.target.map(nodes_data['y']),
    )
)

# Chart building in altair
nodes = (
    alt.Chart(nodes_data)
    .mark_circle(size=300, opacity=1)
    .encode(x=alt.X('x', axis=None), y=alt.Y('y', axis=None), color='betweenness')
)
edges = alt.Chart(edges_data).mark_rule().encode(x='x', y='y', x2='x2', y2='y2')
chart = (
    (edges + nodes)
    .properties(width=500,height=500)
    .configure_view(strokeWidth=0)
    .configure_axis(grid=False)
)

visualization-7

BradKML commented 2 years ago

@BrennanBarker

some marks arranged arbitrarily in X and Y space

That is somewhat concerning, as I would like to play with weighted graphs, and KarateClub does not have edge weights.

BrennanBarker commented 2 years ago

There's no cause for concern when it comes to visualizing edge weights in Altair. Simply link the weight data column to an encoding for the line marks. Below I visualize the classic (weighted) Les Miserables graph, encoding the edge weights by opacity with one small change to the chart specification code from my last:

import altair as alt
import networkx as nx
import pandas as pd

# networkx for example graph data, and layout/SNA calculations
- g = nx.karate_club_graph()
+ g = nx.les_miserables_graph()
positions = nx.spring_layout(g)
betweenness = nx.centrality.betweenness_centrality(g)

# munging into an Altair-friendly format in pandas
nodes_data = (
    pd.DataFrame
    .from_dict(positions, orient='index')
    .rename(columns={0:'x', 1:'y'})
    .assign(betweenness=lambda df:df.index.map(betweenness))
)
edges_data = (
    nx.to_pandas_edgelist(g)
    .assign(
        x=lambda df:df.source.map(nodes_data['x']),
        y=lambda df:df.source.map(nodes_data['y']),
        x2=lambda df:df.target.map(nodes_data['x']),
        y2=lambda df:df.target.map(nodes_data['y']),
    )
)

# Chart building in altair
nodes = (
    alt.Chart(nodes_data)
    .mark_circle(size=300, opacity=1)
    .encode(x=alt.X('x', axis=None), y=alt.Y('y', axis=None), color='betweenness')
)
- edges = alt.Chart(edges_data).mark_rule().encode(x='x', y='y', x2='x2', y2='y2')
+ edges = alt.Chart(edges_data).mark_rule().encode(x='x', y='y', x2='x2', y2='y2', opacity='weight')
chart = (
    (edges + nodes)
    .properties(width=500,height=500)
    .configure_view(strokeWidth=0)
    .configure_axis(grid=False)
)

visualization-8

BradKML commented 2 years ago

This looks like the Kamada System in https://towardsdatascience.com/visualizing-networks-in-python-d70f4cbeb259

But there is an aesthetic alternative like Fruchterman–Reingold https://www.researchgate.net/publication/221157852_Summarization_Meets_Visualization_on_Online_Social_Networks

Weirdly enough the code you provided "Spring" is the latter... IDK why but this is weird https://sci-hub.se/10.1109/TVCG.2019.2934802

BrennanBarker commented 2 years ago

Networkx has several graph layout choices, as do many other dedicated graph analysis libraries. I think Altair (really Vega-Lite) does the right thing by not trying to implement any graph layout algorithms, instead leaving it to the user to calculate layout points themselves using whichever of the many possible tools and algorithms best suits their needs. The calculated layouts are then passed to Altair as X and Y encodings, along with any other data encodings needed to tell the data visualization story.

BradKML commented 2 years ago

@BrennanBarker understandable, but this is based on NetworkX? If so alternate layouts can be constructed using KarateClub's proxemic node embedding.

BradKML commented 2 years ago

Currently, I am observing that

The graph is manufactured like:


# assuming we already have a dataframe
corr = df.corr()

from numpy import tanh, exp, pi
def scale(x): return sign(x)*abs(exp(tanh((x-0.6)/0.6)))
print(list(map(scale,[-1,-0.9,-0.6,-0.3,0,0.3,0.6,0.9,1])))

import altair as alt
import networkx as nx
import pandas as pd

# networkx for example graph data, and layout/SNA calculations
g = nx.from_numpy_matrix(np.vectorize(scale)(corr.to_numpy()))

#relabels the nodes to match the item names
g = nx.relabel_nodes(g,lambda x: df.columns[x])
iridazzle commented 2 years ago

I found a code example of the Airport Connections chart in the altair documentation. Maybe this could help.