microsoft / vivainsights-py

Python package for Analyzing and Visualizing data from Viva Insights
https://microsoft.github.io/vivainsights-py/
Other
14 stars 2 forks source link

Feature: add ONA functionality #8

Closed martinctc closed 10 months ago

martinctc commented 1 year ago

Summary

This branch introduces new functionality to perform organizational network analysis (ONA) on Viva Insights data, specifically the group-to-group query (g2g_data) and the person-to-person query (p2p_data).

This version is also incremented to v0.2.0 and sent for PyPi release on the 4th of September 2023: https://pypi.org/project/vivainsights/

Changes

The changes made in this PR are:

  1. Add the p2p_data and g2g_data sample datasets.
  2. Add the following new functions:
    • network_g2g()
    • network_p2p()
    • network_summary()
    • p2p_data_sim()
    • create_sankey()
  3. Incremented version to v0.2.0

Example output from network_p2p(): image

Checks

Examples

Below is an example on running the group-to-group function:

import vivainsights as vi

vi.network_g2g(
    data = vi.load_g2g_data()
)
g2g

More sample code on the functions is available below:

import vivainsights as vi
import pandas as pd
import random 
from random import sample

# Return a network plot
vi.network_g2g(
    data = vi.load_g2g_data(),
    metric = "Meeting_Count"
)

# Return a network plot - Meeting hours and 5% threshold
vi.network_g2g(
  data = vi.load_g2g_data(),
  metric = "Meeting_Count",
  primary = "PrimaryCollaborator_Organization",
  secondary = "SecondaryCollaborator_Organization",
  exc_threshold = 0.05
)

# Return a network plot - custom-specific colours
# Get labels of orgs and assign random colours
org_str = vi.load_g2g_data()['PrimaryCollaborator_Organization'].unique()
# Generate random colours for each org
col_str = [f"#{random.randint(0, 0xFFFFFF):06x}" for _ in range(len(org_str))]

# Create and supply a dictionary to `node_colour`
node_to_colour = dict(zip(org_str, col_str))

vi.network_g2g(
    data = vi.load_g2g_data(),
    node_colour = node_to_colour
)

# Return a network plot with circle layout
org_tb = pd.DataFrame(
    data={
        "Organization": [
            "G&A East",
            "G&A West",
            "G&A North",
            "South Sales",
            "North Sales",
            "G&A South"
        ],
        "n": sample(range(30, 100), 6)
    },
    columns=["Organization", "n"]
)

vi.network_g2g(
    data = vi.load_g2g_data(),
    algorithm = "circle",
    node_colour = "vary",
    org_count = org_tb
)

# Return an interaction matrix
# Minimum arguments specified

g = vi.network_g2g(
    data = vi.load_g2g_data(),
    return_type = "table"
)
print(g)

Here are some examples for running the network_p2p() function:

import vivainsights as vi
import igraph as ig

# Generate simulated P2P data
p2p_data = vi.p2p_data_sim()

# Generate P2P network plot with default legend position
vi.network_p2p(data=p2p_data, return_type="plot")

# Generate P2P network plot with legend in upper right corner
vi.network_p2p(data=p2p_data, legend_pos="upper right", return_type="plot")

# Generate P2P network plot with legend in lower left corner
vi.network_p2p(data=p2p_data, legend_pos="lower left", return_type="plot")

# Generate P2P network plot with legend on the right side
vi.network_p2p(data=p2p_data, legend_pos="right", return_type="plot")

# Generate P2P network plot with community detection using Leiden algorithm
vi.network_p2p(data=p2p_data, community="leiden", return_type="plot")

# Generate P2P network plot with community detection using Leiden algorithm and custom resolution parameter
vi.network_p2p(data=p2p_data, community="leiden", comm_args={"resolution": 0.01}, return_type="plot")

# Generate P2P network plot with community detection using Leiden algorithm and custom resolution parameter
vi.network_p2p(data=p2p_data, community="leiden", comm_args={"resolution": 0.008}, return_type="plot")

# Generate P2P network table with default settings
vi.network_p2p(data=p2p_data, return_type="table")

# Generate P2P network table with community detection using Leiden algorithm and custom resolution parameter
vi.network_p2p(data=p2p_data, community="leiden", comm_args={"resolution": 0.01}, return_type="table")

# Generate P2P network table with invalid centrality measure
vi.network_p2p(data=p2p_data, centrality="blahblah", return_type="table")

# Generate P2P network data with default settings
vi.network_p2p(data=p2p_data, return_type="data")

# Generate P2P network data with community detection using Leiden algorithm
vi.network_p2p(data=p2p_data, community="leiden", return_type="data")

# Generate P2P network table with community detection using Leiden algorithm
vi.network_p2p(data=p2p_data, community="leiden", return_type="table")

# Generate P2P network object with default settings and extract vertex attributes
out_g = vi.network_p2p(data=p2p_data, return_type="network")
out_g.vertex_attributes()

# Print summary statistics of P2P network object
ig.summary(out_g)

# Print number of vertices in P2P network object
out_g.vcount()

# Print number of edges in P2P network object
out_g.ecount()

# Generate P2P network object with community detection using Leiden algorithm and extract vertex attributes
out_g = vi.network_p2p(data=p2p_data, community="leiden", return_type="network")
out_g.vertex_attributes()

Notes

Special thanks to @ia404, @davinaloures, and @sachinstl for setting up the various ONA functions.

martinctc commented 10 months ago

Thank you @ia404 @davinaloures @sachinstl for your massive contributions to this PR. This version (v0.2.0) is now uploaded to PyPi: https://pypi.org/project/vivainsights/

Merging to main now!