Closed lmeyerov closed 4 years ago
@lmeyerov
I have some of concerns and issues with this feature request. The first, and major, issues is that Graphistry is closed source, and we want to only adds conversion routines for open-source products.
Also, it seems like you are doing extra work. Data starts as a pseudo-property graph in cuDF, where it is rich with attributes. From there a Graph is created, which really just maintains references back to the dataframe. You then want the Graph to create a new DataFrame that sounds similar to the original.
Hi @BradReesWork , thanks for the reasoned response. For the major concern, maybe not obvious, PyGraphistry is an OSS project that is increasingly used as a thick Swiss army knife stuff for going between graph data sources <> pydata, not just our proprietary rapids-native plotter backend. Ex: cugraph's hypergraph pr comes from working off of pygraphistry's code & tests.
On the technical side for cugraph, it's currently unpredictable if some program (user, graphistry, cugraph), across different data, will fail because one part used col "source"', another
"src", and some change does
"src_id"`.
Maybe we can break this down to exposing two thin & generic interop APIs. The result should be less heart burn for both direct users + framework writers. Then this ENH hollows out to tinier features on top of that. Happy to close this ENH and file those, lmk.
====
# already exists, includes attr cols
G.view_edge_list() -> cudf.DataFrame
# missing; current 'G.nodes() -> Series' precludes getting node attr cols
G.view_node_list() -> cudf.DataFrame
#implicit, inconsistent across indiv calls, & bindings changing across upgrades
G.bindings() -> {
'edges': {'source_id': [str], 'destination_id': [str], ?'weight': str, ...},
'nodes': {'node_id': [str], ?'weight': str,
'settings': {...}
}
#Or, per-attrib getters: G._edge_source_id :: [str], G._node_id :: [str], ...
#. ... And same for attribs like any meaningful settings like symmetrized, is_directed, and anything else cugraph sees as important...
That gets read-only use cases far. Eventually, ideally also have reliable dual create
/ update
methods like graph(node_list=...)
, set_node_list()
, clone()
for enriching & interactive analytics flows, and eventually, for the bindings too. I can see that being out-of-scope, but at least getting schema introspection would help writing stable readers.
===
plotter
:Users of cugraph
would benefit from seeing intermediate + final results from a plotter of choice:
cugraph.set_plotter(engine=some_plugin)
G.plot()
where every engine
implements a thin interface like class { 'init': ..., 'plot': (self, data : cuGraph, **kwargs, *args) -> Any }
Implementors of engine
would benefit from (1) method for a more predictable target across upgrades :)
@lmeyerov there are still somethings here that I I still think go beyond what cuGraph should be doing. But delaying fully commenting on issue until after we have property graph support working (very soon). closing issue but will re-address in a month
Describe the solution you'd like
Forwards-compatible cugraph plotting bindings with graphistry:
.plot()
conveniencecugraph
results either for the initial graph, or enriching existing node/edge tables1. in cuGraph:
This would be built from
G.__to_graphistry_nodes()
andG.__to_graphistry_edges()
which return DF's, and based on whatever cugraph settings, add bindings likegraphistry.bind(source='s', destination='d', edge_weight='z', ...)
. If cugraph ever changes those settings, it can update plotter bindings with them.2. in graphistry:
a
b
c
The graphistry side would use the
cugraph
helpersd
Additional context Basics seem simple. cugraph api is a bit of a moving target, so establishing the core helpers on cugraph side would be a big help. for a sense, see code example here:
=>