snowleopard / alga

Algebraic graphs
MIT License
715 stars 68 forks source link

Supporting different node and edge types #282

Open boggle opened 2 years ago

boggle commented 2 years ago

Given a graph with different node types (say "a", "b", "c", etc.) and different edge types ("r", "s", "t"), is it acceptable to modify connect such that the edge type determines the types of source and target nodes that will be connected (ignoring other nodes)? What does making such a change do to the algebra? Is there anything in alga that would help building such graphs?

snowleopard commented 2 years ago

Given a graph with different node types (say "a", "b", "c", etc.) and different edge types ("r", "s", "t"), is it acceptable to modify connect such that the edge type determines the types of source and target nodes that will be connected (ignoring other nodes)?

I'm not sure what you mean by "acceptable". It does make sense to consider restricted forms of the connect operation. I've played with some variants some time ago but I never came up with anything particularly enlightening. The design space here is pretty large, so it's best to discuss a concrete data type. Do you have anything specific in mind?

What does making such a change do to the algebra?

It depends on specifics but I expect that the algebra will change pretty dramatically.

Is there anything in alga that would help building such graphs?

Not at the moment.

snowleopard commented 2 years ago

Is there anything in alga that would help building such graphs?

Not at the moment.

Apart from the simplest case with two vertex types, of course. For that we have Algebra.Graph.Bipartite.AdjacencyMap.

boggle commented 2 years ago

Many real world knowledge graphs will be "mixed graphs", in the sense that they contain different types of nodes connected via different types of edges. Usually such graphs will assume (at least implicitly) some kind of schema that restricts edge types regarding the types of nodes it is allowed to connect. A way to think about this is that every possible edge label is equipped with a predicate on allowed source and target nodes. Of course other encodings of such a simple schema are possible, too. Now of course one could simply defined connect to only connect pairs of nodes that satisfy the predicate associated with the edge label. How many of the laws of the algebra are lost if one does this though?

A second question would be how to make that happen in alga but I take your reply to mean that something like this is currently not supported directly.

snowleopard commented 2 years ago

@boggle Supporting what you describe in full generality is going to be hard, most likely impractical in a language like Haskell, because you'll have to encode these edge-vertex compatibility predicates in types. You probably want a languages with full dependent types for that, like Agda.

A weaker version of what you describe can be achieved by simply using the edge-labelled version of the algebra, where you will attach the predicate to every edge as a label. Then one could construct invalid graphs using the API but it would also be possible to run a validation procedure that checks that every edge satisfies its compatibility predicate. Not sure if this is good enough for you but I don't know how to provide anything better.

Happy to keep this issue open in case anyone comes up with a viable approach.

boggle commented 2 years ago

Interesting, so I understand you to suggest to rather limit the set of valid expressions of the algebra instead of modifying the algebra itself to implicitly uphold schema (e.g. by only creating valid edges). That makes a lot of sense; this way none of the laws of the algebra are invalidated.

snowleopard commented 2 years ago

limit the set of valid expressions of the algebra instead of modifying the algebra itself

I don't think I suggested that. By limiting the set of valid expressions we are most certainly going to modify the algebra.

For example, if we no longer allow connect x y for all possible values of x and y, then connect is no longer a monoid, because in a monoid, the operation must be defined for all possible inputs.

boggle commented 2 years ago

What I meant was: This way none of the laws of the algebra prior to the use of the invalidation procedure are invalidated. On a second thought, maybe this isn't such a big difference.