Proposal: Make the Yatima.Compiler.Graph a generic library

NicolasRouquette commented 2 years ago

The monadic formulation of the strongly-connected component algorithm caught my attention; it's really elegant and concise. I hope that the https://github.com/leanprover/functional_programming_in_lean book will eventually explain the nifty Lean4 programming tricks involved!

I would like to use this library to formulate in Lean4 some algorithms I've been working on in Scala3; see: https://github.com/NicolasRouquette/scala3-bundle-closure

However, the current Graph library is tied to Lean4's symbols as graph vertices:

https://github.com/yatima-inc/yatima-lang/blob/ada78d298e08e495b307cb639f4305731f3af663/Yatima/Compiler/Graph.lean#L32-L34

So as a fun exercise to flex my new Lean4 programming brain, I extracted your library and parameterized it: https://github.com/NicolasRouquette/YatimaGraphLib

To facilitate comparison w/ your original source, I've kept the same file/directory organization & names.

In writing a simple test, I noticed that the buildG function should add the edge target as a vertex in the map. See:

https://github.com/NicolasRouquette/YatimaGraphLib/blob/d74377ead019a05a43b11b0fff535abe3b3466dd/src/lean4/Yatima/Compiler/Graph.lean#L28:L36

If this idea makes sense, are you interested in a PR?

johnchandlerburnham commented 2 years ago

We'd love a PR! Thanks for the proposal and contribution. The scc code definitely should be generalized. Perhaps we can even take this opportunity to factor out a new generic Graph.lean library, that can eventually scale to something along the lines of fgl, graphs in Haskell or petgraph in Rust?

I've created https://github.com/yatima-inc/Graph.lean as a template where we can do this in. @arthurpaulino and @winston-h-zhang, can you guys follow up with how to factor our code out and how to incorporate @NicolasRouquette's changes?

NicolasRouquette commented 2 years ago

Perhaps we can even take this opportunity to factor out a new generic Graph.lean library, that can eventually scale to something along the lines of fgl, graphs in Haskell or petgraph in Rust?

These are great examples. For my internal Scala code base, I've been using this library for several years: http://scala-graph.org/

The petgraph library seems well-reviewed; however, it lacks the monadic style of the Haskell libs. The scala graph library is heavily influenced by the design of the Scala collection library; it's altogether a different style of API.

winston-h-zhang commented 2 years ago

Thanks for the contribution! We should definitely try and factor out any generic graph algorithms. Since John has already set up the Graph.lean repo we can move everything into there first.

fgl, graphs, and petgraph all look like great libraries to follow after. I think we should try to strike a good balance between having good core functionality but maybe not committing to building a truly all-purpose library. graphs and petgraph both seem like clean implementations, and I would vote to follow the design of these two.

Ironically, purely lazy functional code is often not great for Lean, since Lean is strict and can do functional-but-in-place "pseudo-mutations." (For example, many pure functional graph algorithms in Haskell don't quite work because they take advantage of its laziness.) So that's why I'm hesitant on fgl.

I also don't know much about Scala -- how different is the API style?

One final thought is that petgraph lacking monadic style is not really a problem. We say code is "monadic" in the sense that it isn't "functional" and mimics imperative style -- so the Lean code is probably going to be very close to what we would write in Rust anyway.

I'd love to hear any thoughts about this!

NicolasRouquette commented 2 years ago

Thanks for the feedback. It seems that all of these options have some language-specific and API conventions that may not be a good fit for Lean.

The scala-graph library API design is deliberately focused on blending w/ the Scala collection API, which is really nice. I see some similarities here and there with some of the proposed Yatima Std library operations.

I understand that all of the above options involve biases due to the semantics of the languages involved (Haskell, Rust, Scala) and the API conventions they follow; we can get inspiration from these.

If anything deserves to be called "monadic", the Yatima Graph DFS and strongly-connected component algorithms deserve that label:

https://github.com/NicolasRouquette/YatimaGraphLib/blob/master/src/lean4/Yatima/Compiler/Graph.lean#L71:L74

structure dfsState (α) [Ord α] where 
  visited : RBMap α Bool compare

abbrev dfsM (α) [Ord α] := ReaderT (Graph α) $ EStateM String (dfsState α)

and:

https://github.com/NicolasRouquette/YatimaGraphLib/blob/master/src/lean4/Yatima/Compiler/Graph.lean#L119:L136

structure NodeInfo where
  index : Nat
  lowlink : Nat
  onStack : Bool
  deriving Repr

instance : ToString NodeInfo :=
  { toString := fun info => s!"i: {info.index}, low: {info.lowlink}, on: {info.onStack}" }

structure sccState (α) [Ord α] where 
  info : RBMap α NodeInfo compare
  index : Nat 
  stack : List α

instance : Inhabited (sccState α):= 
  { default := ⟨.empty, default, default⟩ }

abbrev sccM (α) [Ord α] [ToString α] := ReaderT (Graph α) $ EStateM String (sccState α)

Both algorithms are implemented as reader monads over the structures mentioned above separately from the definition of the Graph itself.

A good start might be to look at graph for replacing Yatima's specific Graph type

https://github.com/NicolasRouquette/YatimaGraphLib/blob/master/src/lean4/Yatima/Compiler/Graph.lean#L15:L16

structure Graph (α : Type) [Ord α] where
  data : Std.RBMap α (List α) compare

with something similar to graph's generic definition:

https://github.com/ekmett/graphs/blob/master/src/Data/Graph/Class.hs#L43:L50

type VertexMap g = PropertyMap g (Vertex g)
type EdgeMap g = PropertyMap g (Edge g)

class (Monad g, Eq (Vertex g), Eq (Edge g)) => Graph g where
  type Vertex g :: *
  type Edge g :: *
  vertexMap :: a -> g (VertexMap g a)
  edgeMap   :: a -> g (EdgeMap g a)

I really like the graph library because it is closest to Yatima's design in that it clearly separates the generic class for graphs from the generic class for graph algorithms:

https://github.com/ekmett/graphs/blob/master/src/Data/Graph/Algorithm.hs#L30:L37

-- | Graph search visitor
data GraphSearch g m = GraphSearch
  { enterVertex :: Vertex g -> g m -- ^ Called the first time a vertex is discovered
  , enterEdge   :: Edge g   -> g m -- ^ Called the first time an edge is discovered, before enterVertex
  , grayTarget  :: Edge g   -> g m -- ^ Called when we encounter a back edge to a vertex we're still processing
  , exitVertex  :: Vertex g -> g m -- ^ Called once we have processed all descendants of a vertex
  , blackTarget :: Edge g   -> g m -- ^ Called when we encounter a cross edge to a vertex we've already finished
  }

The key difference is that Yatima's DFS and SCC algorithms are not currently written to a common interface like graph's GraphSearch. It is unclear to me whether this is something that we should have in Yatima's Graph lib or not.

NicolasRouquette commented 2 years ago

FYI: I also found that there is an SCC algorithm implemented in Lean:

https://github.com/leanprover/lean4/blob/c81b10f2968db8c6cb86fa0c31bc6f7f4624017a/src/Lean/Util/SCC.lean#L101:L106

Interestingly, there is no graph data structure per se except for the information about vertices and their successors; e.g:

import Lean.Util.SCC

def vertices : List String := ["a", "b", "c", "d", "e", "f"]

def successorsOf(s: String) : List String := 
  match s with 
  | "a" => ["b", "c"]
  | "b" => ["a", "d"]
  | "c" => ["d"]
  | "d" => ["e"]
  | "e" => ["c"]
  | _ => []

#eval Lean.SCC.scc vertices successorsOf
-- [["d", "e", "c"], ["a", "b"], ["f"]]

lurk-lab / yatima

Proposal: Make the Yatima.Compiler.Graph a generic library #88