Hypergraph support? - Githubissues

mohawk2 commented 3 years ago

I see there was some discussion on #22 about a sort-of hypergraph support. In particular, I am interested in not just undirected hyperedges (blobs of several nodes), but directed hyperedges (a blob of nodes as the source, and a blob of nodes as the target). Is there interest in supporting that in JGF?

wshayes commented 3 years ago

Sure - the goal is to maintain simplicity and focus on structure of the graph rather than the layout/viz.

I think adding an equivalence/alias/hypernode? section would work well for this to make it more of a first-order and standardized option. You can already support this in current JGF by adding metadata to the node associating it to a group - but that's not very easy to standardize. With the alias section, you could then use the group node ID's in normal edges.

Is this along the lines of what you were thinking? What about hypernodes where the node can be another graph? I'd prefer to enable both if we add this.

wshayes commented 3 years ago

Hmm - adding a graphs section and a nodeset section both as object/Maps like the nodes section would support this usecase and would be backwards compatible.

mohawk2 commented 3 years ago

If we consider that these blobs (i.e. sets), of which there is one per undirected hyperedge, and two per directed hyperedge, it might just be as simple as updating the spec to allow arrays as well as strings for source and target. For undirected edges, I think it would be best to eliminate "source" and "target", and just have "nodes" as a two-element (or arbitrary number for hyperedges) array.

What do you think?

wshayes commented 3 years ago

I tried to change source/target before and hit a live wire, and those semantics don't really matter if it's undirected. You still need an X related to Y thing for an edge.

Allowing lists of nodes for the edges would simplify the schema, but I still think if making this change we should go ahead and add graphs as nodes to allow for hyperedges. I'm reluctant to complicate the nodes section by allowing them to be either a node or a graph - it would be easier to review a JGF to see that it has hyperedges if we had a graphs section. Adding a hypernodes section would make the hypernodes re-usable where adding a list of nodes to the edges will result in duplicate lists of nodes. A suggested naming convention for nodes/graphs/hypernodes would make them easy to detect in edges for human review of the JGF file.

We've been trying to balance this as a somewhat human-readable format with something that is easy to process for graph sharing/storage.

mohawk2 commented 3 years ago

I think this would be more comprehensible with examples! Here is the output of as_hashes in the Perl Graph module on the edges of hypergraphs (from the test suite: https://github.com/graphviz-perl/Graph/blob/master/t/53_multiedge_attributes.t#L161) that are undirected:

is_deeply $got, [
    {
    vertices => ['a', 'b', 'x'],
    attributes => { hot => { color => 'pearl', weight => 'heavy' } },
    },
    { vertices => ['a', 'b'], attributes => { hot => { weight => 123 } } },
    { vertices => ['c', 'd'], attributes => { hot => { weight => 45 } } },
    { vertices => ['d', 'e'], attributes => { hot => { weight => 46 } } },
], 'undirected hyperedge as_hashes' or diag explain $got;

and directed:

is_deeply $got, [
    {
        predecessors => [qw(a b c)],
        successors => [qw(f g)],
    attributes => { hot => { color => 'pearl', weight => 'heavy' } },
    },
    { predecessors => [qw(a b c)], successors => [qw(f h)], attributes => { hot => { weight => 123 } } },
    { predecessors => ['c'], successors => ['d'], attributes => { hot => { weight => 45 } } },
    { predecessors => ['d'], successors => ['e'], attributes => { hot => { weight => 46 } } },
], 'directed hyperedge as_hashes' or diag explain $got;

I've read through a couple of your examples as used in your test suite. Can you give an example here of the data-shape you have in mind?

wshayes commented 3 years ago

Sorry took the weekend off for once and took a while to catch up with some other stuff, but finally getting back to you.

We could do something like this which would keep things simpler at the structural level, and if we add a node_type to the node with a default type of simple node and only require it for hyper_node and graph nodes, this keeps it easy to process.

{
    "graph": {
        "id": "Usual Suspects",
        "type": "movie characters",
        "label": "Usual Suspects",
        "nodes": {
            "Roger Kint": {
                "label": "Roger Kint",
                "metadata": {
                    "nickname": "Verbal",
                    "actor": "Kevin Spacey"
                }
            },
            "Keyser Söze": {
                "label": "Keyser Söze",
                "metadata": {
                    "actor": "Kevin Spacey"
                }
            },
            "suspects": {"node_type": "hyper_node_list", "nodes": ["Roger Kint", "Keyser Söze"]},
            "suspects2": {
                "node_type": "hyper_node_map",
                "nodes": {
                    "Roger Kint": {
                        "label": "Roger Kint",
                        "metadata": {
                            "nickname": "Verbal",
                            "actor": "Kevin Spacey"
                        }
                    },
                    "Keyser Söze": {
                        "label": "Keyser Söze",
                        "metadata": {
                            "actor": "Kevin Spacey"
                        }
                    }
                }
            },
            "subgraph": {"node_type": "graph", "graph": {add subgraph definition here}}
        },
        "edges": [
            {
                "source": "Roger Kint",
                "target": "Keyser Söze",
                "relation": "is"
            },
            {
                "source": "suspects",
                "target": "subgraph1",
                "interrogated"
            }
        ],
        "metadata": {
            "release year": "1995"
        }
    }
}

This is another approach - one of the ones I referred to earlier.

{
    "graph": {
        "id": "Usual Suspects",
        "type": "movie characters",
        "label": "Usual Suspects",
        "nodes": {
            "Roger Kint": {
                "label": "Roger Kint",
                "metadata": {
                    "nickname": "Verbal",
                    "actor": "Kevin Spacey"
                }
            },
            "Keyser Söze": {
                "label": "Keyser Söze",
                "metadata": {
                    "actor": "Kevin Spacey"
                }
            }
        },
        "hyper_nodes": {
            "suspects": ["Roger Kint", "Keyser Söze"]
        },
        "graphs": {
            "subgraph1": {
                Add graph here
            }
        },
        "edges": [
            {
                "source": "Roger Kint",
                "target": "Keyser Söze",
                "relation": "is"
            },
            {
                "source": "suspects",
                "target": "subgraph1",
                "interrogated"
            }
        ],
        "metadata": {
            "release year": "1995"
        }
    }
}

mohawk2 commented 3 years ago

This seems to be moving in the direction of having nodes that are actually hypernodes, in order to keep "edges" simple. In my reading around the literature, I have yet to see any comprehensible application for hypernodes (which is why they're not in the Perl Graph module anymore), while (as the Wikipedia page on hypergraphs shows), there are many, many applications for hypergraphs where nodes are still individual things, but the edges are slightly more expressive/powerful.

I would urge you again to consider having proper, idiomatic hyperedges, both directed and undirected. Having connections between "graphs" is not a substitute for that. This does not need to be backwards-incompatible.

As a comparison, GraphML has support for proper (but only undirected) hyperedges: http://graphml.graphdrawing.org/primer/graphml-primer.html#Hyperedges - it has a separate element, "hyperedge". You could do similar by having instead of an "edges" object, a "hyperedges" array, which be "oneOf" either an array of directed or undirected hyperedge objects, each specified similar to the Perl code above.

wshayes commented 3 years ago

Sorry - I really didn't understand hypergraphs - just read through the definition more closely.

I think we need a different edge to support this and you can disregard what I suggested above. From what I currently understand, an undirected hyperedge would just be a list of nodes, a directed hyperedge are two lists of nodes - do we need a relation attribute for these?

# example - not fully correct JSON
hyperedges: [
   ([1, 2, 3], metadata: {}),   # undirected
   ([1, 2, 3], [4, 5, 6], "relation?", metadata: {})   # directed
]

Does this make sense or am I still misunderstanding? I'd prefer to segregate these edges because I don't think there is a lot of overlap between simple graphs and hypergraphs (or am I wrong in expectation). I'd also prefer not to for users in parsing these to have to figure out if the node in an edge is a list or not and hypergraphs don't need two nodes/nodesets whereas a normal edge must have two nodes.

mohawk2 commented 3 years ago

I feel this is movement in the right direction! It doesn't make sense at all to have a graph which could have both "edges" and "hyperedges", so the way forward would be to "oneOf" that as well. And the "hyperedges" would be either an array (representing a set) or either the undirected or directed things. To be concrete:

Undirected:

{
  "graph": {
    "id": "1",
    "type": "weighted network",
    "label": "None",
    "nodes": { "a": {}, "b": {}, "c": {}, "d": {}, "e": {}, "x": {} },
    "hyperedges": [
      { "nodes": ["a", "b", "x"], "metadata": { "weight": 17 } },
      { "nodes": ["a", "b"], "metadata": { "weight": 123 } },
      { "nodes": ["c", "d"], "metadata": { "weight": 45 } },
      { "nodes": ["d", "e"], "metadata": { "weight": 46 } }
    ]
  }
}

and directed:

{
  "graph": {
    "id": "1",
    "type": "weighted directed network",
    "label": "None",
    "nodes": { "a": {}, "b": {}, "c": {}, "d": {}, "e": {}, "f": {}, "g": {}, "h": {} },
    "hyperedges": [
      { "source": ["a", "b", "c"], "target": ["f", "g"], "metadata": { "weight": 17 } },
      { "source": ["a", "b", "c"], "target": ["f", "h"], "metadata": { "weight": 123 } },
      { "source": ["c"], "target": ["d"], "metadata": { "weight": 45 } },
      { "source": ["d"], "target": ["e"], "metadata": { "weight": 46 } }
    ]
  }
}

(I've used "source" and "target" for directed, and "nodes" for undirected, as that's closest to the idiom you already have)

edit: added missing close-braces.

wshayes commented 3 years ago

That looks good to me. I like the oneOf idea as well as I think these will be one or the other. Do you want to create the pull request for JGF v3 with hyperedges? I'm happy to do it as well, but you deserve the credit for it (which I'd note but it's not as impactful as submitting the PR).

mohawk2 commented 3 years ago

I'll be pleased to have a go, and then accept any helpful suggestions towards making it good enough! :-D

mohawk2 commented 3 years ago

@wshayes see #51. Thoughts welcome!

Immediate thought: this change wouldn't require a major version bump (all the 2.0 data passes), but I think for semver reasons this justifies a version change to v2.1? If so, let me know and I'll also update the $id.

wshayes commented 3 years ago

If so, then agreed 2.1.

mohawk2 commented 3 years ago

I've added a commit that bumps minor version to 2.1.

adifabio commented 3 years ago

Sorry, I am late to the party on this. Great job @wshayes and @mohawk2 in collaborating through this. I really like the final solution proposed and the use of oneOf for both compatibility and separation of interests. Well done.

jsongraph / json-graph-specification

Hypergraph support? #50