graphistry / pygraphistry

PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer
BSD 3-Clause "New" or "Revised" License
2.17k stars 206 forks source link

[FEA] anonymize graph #385

Open lmeyerov opened 2 years ago

lmeyerov commented 2 years ago

Is your feature request related to a problem? Please describe.

When sharing graphs with others, especially via going from private server / private account -> public hub, such as for publicizing or debugging, it'd help to have a way to quickly anonymize a graph

Sample use cases to make fast:

Perf:

Describe the solution you'd like

Something declarative and configurable like:

g2 = g.anonymize(
node_policy={
  'include': ['col1', ...],  # safelist of columns to include
  'preserve': ['col1', ...],  # opt-in columns not to anonymize,
  'rename': ['col1', ...] | True,
  'sample_drop': 0.2 # % nodes to drop; 0 (default) means preserve all 
  'sample_add': 0.2 # % nodes to add; 0 (default) means add none
},  
edge_policy={
  'drop': ['col2', ...]  # switch to opt-out via columns to exclude
},
sample_keep=..,
sample_add=...
)

g2.plot()

g_orig = g2.deanonymize(g2._anon_remapping)

Sample transforms:


If there is a popular tabular or graph centric library here that is well-maintained, we should consider using ... but not if it looks like maintenance or security risks

Additional context

Ultimately it'd be good to push this to the UI via some sort of safe mode: role-specific masking, ...

sky-2002 commented 2 years ago

Hello @lmeyerov 😇, I am interested in contributing to this, can you assign this issue to me? Any tips for where to start with ?

lmeyerov commented 2 years ago

awesome!

lmeyerov commented 2 years ago

(happy to review PRs as they happen!)