Open michaelpj opened 5 years ago
As an example of how this can lead to some (accidental?) inconsistency, scc
isn't on ToGraph
, although it's perfectly definable in terms of toAdjacencyMap
.
I'm not a big fan of ToGraph
either and I'd be happy to trim it down, so let me list all current methods and give some rational for why they are not just functions with a ToGraph
constraint:
foldg
is the only interesting method really.
toGraph
could in principle be demoted to a function, although it feels natural to define a ToGraph
instance by implementing the toGraph
method. The Graph
data type has a nice implementation id
, but I think performance-wise we should be able to simplify the default implementation foldg Empty Vertex Overlay Connect
to id
using @nobrakal's rewrite rules.
isEmpty
: this one is unlikely to ever become a performance bottleneck, but still there are instances with O(1) run time cost, which you can't achieve when going via Graph
or AdjacencyMap
.
size
: I think this one should just be deleted from the ToGraph
module altogether, since it only makes sense for algebraic representations.
hasVertex
, hasEdge
, vertexCount
, edgeCount
: like isEmpty
, these should probably stay, because some instances have O(1) implementations, which you can't achieve when going via Graph
or AdjacencyMap
. As examples, consider Data.Graph
from containers
(which I plan to make an instance of ToGraph
), or an adjacency matrix that can answer hasEdge
queries in O(1).
vertexList
, vertexSet
, vertexIntSet
, preSet
, preIntSet
, postSet
, postIntSet
: algebraic representations like Graph
can implement these functions more efficiently than by going via an AdjacencyMap
(even if they currently don't do that).
edgeList
, edgeSet
, adjacencyList
: again, some instances like Relation
give you faster access to various collections of edges.
adjacencyMap
, adjacencyIntMap
, adjacencyMapTranspose
, adjacencyIntMapTranspose
: these could probably be removed from ToGraph
. They can be efficiently implemented by going via the corresponding to*
methods (at least until we change the implementation of AdjacencyMap
s).
dfsForest
, dfsForestFrom
, dfs
, reachable
, topSort
, isAcyclic
, isDfsForestOf
, isTopSortOf
: all of these are currently implemented by going via the AdjacencyMap
, but I hope to provide alternative implementations for algebraic graphs, which will be faster on dense graphs.
toAdjacencyMap
, toAdjacencyMapTranspose
, toAdjacencyIntMap
, toAdjacencyIntMapTranspose
: these seem to be important, at least for now, when we don't know many interesting algorithms that operate directly on algebraic graph representations. Why do we need transpose
versions? Because you can have instances for which computing the transpose is just O(1), for example adjacency maps that store both forward and reverse edge directions, separately.
As an example of how this can lead to some (accidental?) inconsistency,
scc
isn't onToGraph
, although it's perfectly definable in terms oftoAdjacencyMap
.
This is not accidental. I just don't know how to make it a method of the ToGraph
type class, because it returns a doubly-layered graph. The best we could do is something like this:
scc :: (ToGraph g, ToGraph h, ToVertex h ~ g) => g -> h
Alas, this doesn't really express the fact that the inner graphs in the scc
result are non-empty.
Summary: Here are the methods we should probably remove from ToGraph
: size
(remove from the module altogether, it was just a mistake), adjacencyMap
, adjacencyIntMap
, adjacencyMapTranspose
, adjacencyIntMapTranspose
.
@michaelpj Does this sound reasonable?
Hm, it seems like many of these do in fact have at least the potential for more efficient implementations.
I would be tempted to split out the "can be folded into a graph" class from the "overridable graph operations class" for clarity, but that's somewhat subjective and based on my confusion on first discovering the class.
i.e. something like
class ToGraph t where
toGraph :: ...
foldg :: ...
class ToGraph t => GraphOps t where
hasVertex :: ...
...
You could even drop the ToGraph
constraint and just offer default signatures that add it.
This is not accidental. I just don't know how to make it a method of the ToGraph type class, because it returns a doubly-layered graph. The best we could do is something like this:
Couldn't you concretely return an AdjacencyMap (NonEmpty.AdjacencyMap a)
in this case? It's not maximally generic, but I'm not sure how much that matters.
I have a feeling that splitting off ToGraph
and GraphOps
might bring even more confusion... Let's keep this issue open, perhaps we'll figure out a better approach some day.
Couldn't you concretely return an
AdjacencyMap (NonEmpty.AdjacencyMap a)
in this case? It's not maximally generic, but I'm not sure how much that matters.
Well, in this case I won't be able to add scc
for algebraic graphs that returns Graph (NonEmpty.Graph a)
, which is annoying. I think there should be some generic way of getting to the NonEmpty
version for all our graph implementations.
I understand that the intention of
ToGraph
, likeFoldable
, is that some of the methods can be overridden to provide better performance.But in practice it seems like the only ones that actually matter are
foldG
(special implementation forFold
)toGraph
,toAdjacencyMap
,toAdjacencyIntMap
(special implementations for those very types or similar ones)AFAICT the overriding implementations are all equivalent to calling the default implementations on
ToGraph
with the specialised version of e.g.toAdjacencyMap
. For example,LabelledAdjacencyMap
just delegates to theAdjacencyMap
functions after callingskeleton
... which is just its definition oftoAdjacencyMap
.So I think we could have a
ToGraph
class that just hadfoldG
and thetoX
methods, and then provide some generic implementations of the others as functions with aToGraph
constraint, rather than class methods.Is this actually a problem? Not enormously, it just looks odd to a reader to have such a profusion of class methods, which will potentially grow indefinitely as more algorithms are added.
(I might just be missing some overrides that are in fact performance-critical, of course!)