pymc-devs / symbolic-pymc

Tools for the symbolic manipulation of PyMC models, Theano, and TensorFlow graphs.
https://pymc-devs.github.io/symbolic-pymc
Other
61 stars 8 forks source link

Document Meta Graph Design and Functionality #2

Closed brandonwillard closed 4 years ago

brandonwillard commented 5 years ago

To introduce and/or recap, our S-expression emulation takes the form of expression-tuples/etuples and is directly related to the tuple evaluation in kanren. Expression tuples serve as one type of object representation for terms and term graphs.

They're easier to manipulate in more generic ways than meta graph objects (e.g. they can be used to represent and construct dynamic graph structures by way of cons semantics) and can be used to manipulate and represent model graphs without implementing any support for backend tensor library logic (except perhaps conversion from backend models to expression tuples). In a sense, they serve as a type of generalized Python AST Call.

An example of their use is given in the following:

from operator import add
from symbolic_pymc.etuple import etuple

>>> etuple(add, 1, 1)
ExpressionTuple((<built-in function add>, 1, 1))
>>> etuple(add, 1, 1).eval_obj
2

In general, there are two non-standard elements involved in our regular use of etuples:

A re-constructed etuple using the original one's tail

e2 = (call_op,) + e1[1:] res2 = e2.eval_obj

e1 is e2 and res1 is res2 True


* Keyword argument support
* We need this to work around implicit uses of default argument values and non-positional keyword specification (i.e. we don't want to explicitly add all the intermediate positional default keyword values just to specify an optional/keyword argument near the end of a signature).

@junpenglao pointed me toward this project: https://github.com/google/tangent

Presumably, it can be used to accomplish some functionality similar to our term/operator/arguments.

brandonwillard commented 5 years ago

drython implements S-expressions, so we should look into that, as well.

brandonwillard commented 5 years ago

Regarding keyword argument support, we should probably adopt the Signature.from_callable approach used in https://github.com/pymc-devs/symbolic-pymc/pull/4 (here specifically). That would normalize function signatures and automatically account for default values of unspecified keyword arguments.

brandonwillard commented 5 years ago

We need a short demonstration of the meta graph and S-expression functionality—alongside a short unification example that uses them (and can preferably fit in the README).

I'm thinking of something like the following, which is based on one of our unit tests:

An Expository Example

In the course of using PyMC, we might—implicitly—define a model with inputs or a log-likelihood that contains products of the form A * (b + c). Under PyMC4, those terms are represented, behind the scenes, by a TensorFlow graph; one that could've been created with an expression like tf.matmul(A, b + c). Let's say that we want to apply a "rewrite" that follows the distributive property of matrix multiplication and replaces instances of A * (b + c) with A * b + A * c.

The symbolic_pymc library provides the basic mechanics necessary to perform such rewrites within any backend that supports unification and reification. It currently has preliminary support for the two PyMC backends: Theano and TensorFlow.

In the following, we'll demonstrate these basic mechanics with the purpose of matching predefined forms in a graph and producing new graphs from the results. Afterward, we'll demonstrate how these mechanics are reflected in the relational DSL miniKanren and some of the capabilities it provides.

Matching and Replacing

For simplicity, we'll start with a manually constructed TF graph.

import tensorflow as tf

from unification import unify, reify, var, variables

from symbolic_pymc.tensorflow.meta import mt
from symbolic_pymc.etuple import (ExpressionTuple, etuple, etuplize)

A = tf.compat.v1.placeholder(tf.float64, name='A',
                             shape=tf.TensorShape([None, None]))
x = tf.compat.v1.placeholder(tf.float64, name='x',
                             shape=tf.TensorShape([None, 1]))
y = tf.compat.v1.placeholder(tf.float64, name='y',
                             shape=tf.TensorShape([None, 1]))

z = tf.matmul(A, x + y)

Using symbolic_pymc, we can convert the graph to an expression-tuple as follows:

z_sexp = etuplize(z)

which results in

>>> z_sexp
ExpressionTuple((
  TFlowMetaOpDef(MatMul),
  ExpressionTuple((
    TFlowMetaOpDef(Placeholder),
    tf.float64,
    ExpressionTuple((
      symbolic_pymc.tensorflow.meta.TFlowMetaTensorShape,
      [Dimension(None), Dimension(None)])),
    'A')),
  ExpressionTuple((
    TFlowMetaOpDef(Add),
    ExpressionTuple((
      TFlowMetaOpDef(Placeholder),
      tf.float64,
      ExpressionTuple((
        symbolic_pymc.tensorflow.meta.TFlowMetaTensorShape,
        [Dimension(None), Dimension(1)])),
      'x')),
    ExpressionTuple((
      TFlowMetaOpDef(Placeholder),
      tf.float64,
      ExpressionTuple((
        symbolic_pymc.tensorflow.meta.TFlowMetaTensorShape,
        [Dimension(None), Dimension(1)])),
      'y')),
    'add')),
  False,
  False,
  'MatMul'))

The first element of z_sexp is a meta version of z.op.op_def. In symbolic_pymc, meta OpDefs are callable and, as a result, can be used like the function/operator term in an S-exp. Essentially, meta OpDefs are overloaded to work like the TF Python interface functions (e.g. tf.matmul) and likewise produce meta versions of tf.Operations. The remaining elements of the expression-tuple, z_sexp, are expression-tuple versions of the arguments to the meta "MatMul" OpDef.

Using this representation of the TF graph for z, we can easily construct arbitrary expression-tuples to unify against. Expression-tuples are a convenient form for unification, because they relieve one from having to know/specify some of the details/internals of the meta graphs they produce/correspond to.

# S-exp for `A . (b + c)` with logic variables `A`, `b`, and `c`
dis_pat = etuple(mt.matmul, var('A'),
                 etuple(mt.add, var('b'), var('c'), var()),
                 # Some parameters we can ignore...
                 var(), var(), var())

s = unify(dis_pat, z_sexp, {})
>>> s
{~A: ExpressionTuple((
   TFlowMetaOpDef(Placeholder),
   tf.float64,
   ExpressionTuple((
     symbolic_pymc.tensorflow.meta.TFlowMetaTensorShape,
     [Dimension(None), Dimension(None)])),
   'A')),
 ~b: ExpressionTuple((
   TFlowMetaOpDef(Placeholder),
   tf.float64,
   ExpressionTuple((
     symbolic_pymc.tensorflow.meta.TFlowMetaTensorShape,
     [Dimension(None), Dimension(1)])),
   'x')),
 ~c: ExpressionTuple((
   TFlowMetaOpDef(Placeholder),
   tf.float64,
   ExpressionTuple((
     symbolic_pymc.tensorflow.meta.TFlowMetaTensorShape,
     [Dimension(None), Dimension(1)])),
   'y')),
 ~_1: 'add',
 ~_2: False,
 ~_3: False,
 ~_4: 'MatMul'}

The result of unification is a dict of substitutions for the unified logic variables. As we can see, it properly matched the logic variables in our pattern to the corresponding components of the expression-tuple form of z.

From here, we can create another expression-tuple pattern using the same logic variables and reify to produce a new expression-tuple that can be evaluated.

# Our "output" S-exp takes the form of multiplicative distribution, i.e.
# `A . x + A . y`.
# For convenience, we use `mt` to obtain meta `OpDef`s.
out_pat = etuple(mt.add,
                 etuple(mt.matmul, var('A'), var('b')),
                 etuple(mt.matmul, var('A'), var('c')))
z_dist = reify(out_pat, s)
>>> z_dist
ExpressionTuple((
  TFlowMetaOpDef(Add),
  ExpressionTuple((
    TFlowMetaOpDef(MatMul),
    ExpressionTuple((
      TFlowMetaOpDef(Placeholder),
      tf.float64,
      ExpressionTuple((
        symbolic_pymc.tensorflow.meta.TFlowMetaTensorShape,
        [Dimension(None), Dimension(None)])),
      'A')),
    ExpressionTuple((
      TFlowMetaOpDef(Placeholder),
      tf.float64,
      ExpressionTuple((
        symbolic_pymc.tensorflow.meta.TFlowMetaTensorShape,
        [Dimension(None), Dimension(1)])),
      'x')))),
  ExpressionTuple((
    TFlowMetaOpDef(MatMul),
    ExpressionTuple((
      TFlowMetaOpDef(Placeholder),
      tf.float64,
      ExpressionTuple((
        symbolic_pymc.tensorflow.meta.TFlowMetaTensorShape,
        [Dimension(None), Dimension(None)])),
      'A')),
    ExpressionTuple((
      TFlowMetaOpDef(Placeholder),
      tf.float64,
      ExpressionTuple((
        symbolic_pymc.tensorflow.meta.TFlowMetaTensorShape,
        [Dimension(None), Dimension(1)])),
      'y'))))))

If we evaluate the expression-tuple, we get a meta graph:

z_dist_mt = z_dist.eval_obj

If we reify the meta graph, we get a base object (i.e. TF object):

z_dist_tf = z_dist_mt.reify()

The resulting TF graph is a "distributed" version of the matrix multiplication in z that uses the same inputs:

>>> z_dist_tf
<tf.Tensor 'Add_1:0' shape=(None, 1) dtype=float64>
>>> list(z_dist_tf.op.inputs)
[<tf.Tensor 'MatMul_1:0' shape=(None, 1) dtype=float64>,
 <tf.Tensor 'MatMul_2:0' shape=(None, 1) dtype=float64>]
>>> [list(i.op.inputs) for i in z_dist_tf.op.inputs]
[[<tf.Tensor 'A:0' shape=(None, None) dtype=float64>,
  <tf.Tensor 'x:0' shape=(None, 1) dtype=float64>],
 [<tf.Tensor 'A:0' shape=(None, None) dtype=float64>,
  <tf.Tensor 'y:0' shape=(None, 1) dtype=float64>]]

In the example above, we unified/reified against expression-tuples; however, this is not the only viable approach. One can just as easily unify against the meta graph objects directly or even Python AST.

Finally, while unify and reify are clearly useful for term rewriting, they alone do not capture the language and abstractions that underlie the term rewriting objectives (e.g. the distributive property). To effectively codify higher-level concepts and orchestrate more sophisticated logic that involves unification and reification—and the results they produce—we use miniKanren (via the kanren package).

Defining Relations

TBD

brandonwillard commented 5 years ago

A site based on #61 has been added: https://pymc-devs.github.io/symbolic-pymc/.

brandonwillard commented 4 years ago

The expression tuple functionality is now in its own package: etuples; otherwise, this documentation requirement is now covered by the aforementioned site.