joelberkeley / spidr

Accelerated machine learning with dependent types
Apache License 2.0
72 stars 4 forks source link

Function to cache functions #374

Open joelberkeley opened 8 months ago

joelberkeley commented 8 months ago

It would be good to be able to reuse sections of the graph, which should improve the speed of interpreting the AST, and possibly improve the size and efficiency of the compiled XLA graph, depending on what XLA does with it. This feels like jax.jit, but I don't think it is, given this post (including the discussion).

This is probably fairly easy to implement. We would need to add a cache of subgraphs to Graph, along with a reference to each subgraph, which we can refer to in a Call node, e.g.

jit : (Tensor sa ta -> Graph $ Tensor sb tb) -> Graph (Tensor sa ta -> Graph $ Tensor sb tb)
jit f = do
  let (env, result) = runState empty (f (Arg 0))
  graph <- get
  subGraphReference <- addSubGraph graph.subGraphs env
  pure (\x => addNode (Call subGraphReference x))

expensive : Tensor [] F64 -> Graph $ Tensor [] F64

efficient : Graph $ Tensor [] F64
efficient = do
  cachedExpensive <- jit expensive
  x <- cachedExpensive 1
  y <- cachedExpensive 2
  pure (x + y)

which should result in us tracing expensive just the once. Note we've not yet compiled expensive to XLA. That's up to the compilation step.