pymc-devs / symbolic-pymc

Tools for the symbolic manipulation of PyMC models, Theano, and TensorFlow graphs.
https://pymc-devs.github.io/symbolic-pymc
Other
61 stars 8 forks source link

Relationship with pyro-ppl/funsor? #10

Closed eb8680 closed 5 years ago

eb8680 commented 5 years ago

Hi @brandonwillard, I work on the Pyro team, where we've been independently developing what seems like a closely related library. I just stumbled across this today, and it looks like a really neat project! I'm curious about the design goals and applications you have in mind, and ultimately whether there's scope for collaboration.

Specifically, @fritzo and I have been working on a new library called Funsor (see also design doc) to support more advanced inference algorithms like Pyro-style tensor variable elimination, Infer.Net-style expectation propagation or Birch-style sequential Monte Carlo with delayed sampling, as well as allowing more sophisticated probabilistic operations like online updating, causal inference via do-calculus or conditioning on more complex events specified with constraints. Funsor is not currently connected to Pyro itself except for some tests, but eventually we expect to rewrite most of Pyro proper into a very thin layer on top of a backend-independent version of Funsor.

My first impression of the conceptual overlap between the two projects from glancing over the code and tests is that Funsor is sort of similar to what you have here, but we started with named tensors as a fundamental data structure, wrote a new backend-independent graph representation and evaluator/rewriter rather than using Theano and Kanren, and are currently focused on machinery to support things like #5 and #6. Our choice to start at a slightly lower level has come at the expense of missing functionality relative to what's already in Theano and Kanren, but we have implemented a linear Gaussian state space model example (as proposed in #9) as a first demonstration (our example code here). Does that sound roughly correct?

cc @neerajprad, @lawmurray, @fehiepsi

brandonwillard commented 5 years ago

@eb8680, thanks for getting in touch!

It seems like a few basic things overlap—most notably the context involving symbolic computation, tensors and probability/statistics, but the goals and approach might be dramatically different.

This project is ultimately concerned with the type of symbolic computation that implements and models elements of the relevant mathematics. This involves automations/optimizations driven by specific areas of mathematics (e.g. algebra, probability) and some basic attempts to model the useful "language" aspects of said mathematics with more transferrable functionality and flexibility than the standard OO APIs provide (e.g. implement and apply commutativity, convexity, and other such properties dynamically and in as much generality as mathematically possible).

As well, this project attempts to do so by leveraging the light-weight DSL miniKanren to layer the high-level symbolic math onto existing tensor libraries (currently Theano and TensorFlow). The meta object functionality—which appears to most closely reflect Funsor—is not intended to be a primary focus and arose more as a necessary evil. As a matter of fact, I'm pretty close to entirely moving away from this approach and fully committing to a much more minimal S-expression-like emulation.

The bigger picture

The whole S-expression thing ties into another important aspect of this project: leverage as much as possible of the well-established symbolic computation research provided by the lambda calculus, type theory, term-rewriting, etc. Essentially, we want to stay as close to those foundational areas as possible without being too language bound or API restricted. miniKanren is a surprisingly simple means of doing that, since one can—for example—relatively easily implement type checking, inhabitation, inference for nearly arbitrary type theories/systems. As well, miniKanren provides all the benefits of relational/logic programming!

With a minimal patchwork of elements from those areas implemented in a framework like miniKanren, it seems much more possible to provide a powerful, flexible and logically faithful math-like interface. In other words, one can incrementally build toward light-weight theorem proving and assistance for only the relevant areas of study (e.g. elements of algebra and measure theory) and sophisticated graph manipulation all within the same relational DSL context. Furthermore, that context is host-language transferrable—i.e. functionality built in miniKanren can be used with miniKanren in other host languages.

Personally, I've been going on for years about automatic Rao-Blackwellization and symbolic math for bayesian model estimation, and it's showed up in nearly every one of my projects. However, after moving from language to language—and the math, linear/tensor algebra libraries provided by them—I've come to see that the DSL-level flexibility described above is absolutely critical.

In the short-term, one can always set aside a weekend and implement their own specialized Monte Carlo routines, automated Rao-Blackwellization, graph methods, etc., for the language/platform/library of the decade/year/month, but, in the long-term, it gets harder to build toward truly useful and transformative tools. Furthermore, all of these MCMC methods and symbolic optimizations could be more efficiently implemented directly within a given platform's API (e.g. Theano, TensorFlow, PyTorch, whatever) and host language. Just the same, any one non-trivial symbolic math feature is almost always better provided by a complete symbolic math library (e.g. Mathematica, Maple, SymPy, etc.)

The capabilities I'm looking for involve a sophisticated and mostly language/platform-agnostic patchwork of those elements that enables the incorporation of existing libraries and/or allows one to naturally [re-]implement parts of them as necessary. Better yet, when those elements happen to have succinct formulations in terms of generalizable relations of varying abstractions over nearly arbitrary expression-representing objects—realized as streams of potentially infinite and orderable formulations/permutations—which the relevant areas of graph/symbolic computation and mathematical statistics do, the approach used by this project starts to make more sense.

As well, there are a few things from my mathematical statistics work in sparsity and model selection that I'm working toward, and they have requirements that motivate the implementation flexibility described above (e.g. graph/model manipulation driven by minimally sufficient combinations of CLP, [concentration] inequalities, and function-analytic properties). I'm just getting to those implementations within this project, and I think the reasons will become much clearer as it progresses.

Otherwise, you can find the most basic ideas for the functionality of this project outlined in the articles on my site. Quite a few of those ideas (e.g. Rao-Blackwellization) overlap directly with the recent papers you linked to—with the exception of proximal methods, which hasn't caught on, just yet, as a great framework for driving symbolic function optimization.

Also, It seems like there's some DLM/state-space and SMC/particle filtering in your group, which makes the PP abstractions, context-dependent sampling, and automatic Rao-Blackwellization/marginalization interests much clearer. These are all things I had to implement by hand for NYC's bus tracking system back in 2012, and ultimately led me to this type of work!

eb8680 commented 5 years ago

Thanks, this is an awesome response!

It seems like a few basic things overlap—most notably the context involving symbolic computation, tensors and probability/statistics, but the goals and approach might be dramatically different ... The meta object functionality—which appears to most closely reflect Funsor—is not intended to be a primary focus and arose more as a necessary evil. As a matter of fact, I'm pretty close to entirely moving away from this approach and fully committing to a much more minimal S-expression-like emulation.

I think we agree more than you think we do :) (or at least more than is reflected in our design doc, which is rather out of date in its discussion of the more abstract ideas here). Funsor is ultimately intended to be a platform for semiring programming, which captures all of the use cases we've each mentioned here and many more and makes sure they compose correctly. The papers I cited above are really motivating applications for the library.

I believe Funsor terms are essentially a version of the S-expression representation you're talking about - see the expression interpreter, which is exactly an (unoptimized) S-expression evaluator for a first-order term graph language. We use something like your meta objects to extract expressions from normal-looking Python programs and then either evaluate them or rewrite them, in our case by substituting in other expressions for free variables or performing a series of staged nonstandard interpretations that reflect (as in the Lightweight Modular Staging approach to DSL implementation), rather than using a Kanren interpreter as a term rewriting engine.

Another way to look at Funsor is as a Kanren analogue to Dyna, in that Funsor is to Kanren as Dyna is to Datalog. A core piece of leverage offered by Funsor is an abstract version of the variable elimination algorithm, which is much of this:

implement and apply commutativity, convexity, and other such properties dynamically and in as much generality as mathematically possible

implemented as a series of nonstandard interpretations of some linear terms (a sum over free variables and an n-ary product of terms) in the S-expression language that is completely generic in the actual sum and product operations and not restricted to probabilistic inference. For example, Gaussian variable elimination comes from applying this machinery where the linear operations are Gaussian integrals, symbolic optimization comes from the same machinery where the linear operations are (arg)max and sum, and one kind of relational interpreter comes from the same machinery where the "linear" operations are conjunctions and disjunctions.

The scope of applications gets widened further by expanding the set of abstract linear operations in the term language, for example to include a fixpoint operator (for recursion) or a binary tensor contraction (for Lebesgue integration or Hilbert space inner products).

language/platform-agnostic patchwork of those elements that enables the incorporation of existing libraries and/or allows one to naturally [re-]implement parts of them

One other feature that's missing from Funsor that we intend to implement and would go a long way toward completing a version of the vision you've set out here is a generic pattern-matching system, so that we could easily implement something like autoconj as a tiny declarative library of patterns and nonstandard interpretations of those patterns compatible with any other library implemented with Funsor. We could use an existing miniKanren to implement this or write our own tiny relational interpreter and build on top of that.

We could also imagine patching different existing systems together with Funsor by using the nonstandard interpretation mechanism to compile expressions to and from those different systems, as in LMS. Sticking with the autoconj example, we could write an interpreter to compile expressions to the autoconj pattern language, have autoconj rewrite them, and use a meta object (the Funsor NumPy wrapper) to get the resulting expression back from the new NumPy expression generated by autoconj.

Furthermore, that context is host-language transferrable—i.e. functionality built in miniKanren can be used with miniKanren in other host languages.

This is another really interesting point. We've designed Funsor so that it could eventually be rewritten in Julia or C++ and reused across languages that way, which I think would ultimately lead to the best performance, but the number of different host languages with existing minikanren implementations is definitely a practical consideration in favor of building such a system around minikanren instead if language-agnosticism is a near-term priority.

brandonwillard commented 5 years ago

I'm still looking over everything, but, if it's possible to use Funsor for the meta/graph objects and S-expression-like functionality, then I'm all in. Writing the necessary meta graph functionality is a whole project on its own, and it's not even my main objective!

My first questions/concerns involve the extent of backend tensor library coverage that's current, expected and possible (e.g. Theano, TensorFlow), how lightweight it can be for rapid graph construction/deconstruction and the like, and the amount of overlap with symbolic features and math abstractions that I'm already/will be handling in miniKanren.

Anyway, I'll look into Funsor more closely today.

eb8680 commented 5 years ago

the extent of backend tensor library coverage that's current, expected and possible (e.g. Theano, TensorFlow)

The term language is semantically independent of any particular tensor backend, but to actually evaluate individual terms to data they need to be backed by an implementation. Currently we have support for PyTorch and less complete support for NumPy (which will also work with Jax).

Adding new backends is straightforward and independent of the rest of the codebase thanks to multipledispatch - as an example, here's the PR https://github.com/pyro-ppl/funsor/pull/58 where the NumPy backend was added by @neerajprad. We hadn't planned on adding support for Theano and Tensorflow in the near term because we didn't have a use case ourselves but we could certainly consider it.

how lightweight it can be for rapid graph construction/deconstruction and the like

Expression construction is currently pretty lightweight, certainly not more expensive than Theano. However, we make heavy use of multipledispatch and Python overhead is always an issue in this sort of library, especially since we haven't focused on performance yet. I would generally expect more complicated transformations to only be applied once and the transformed expression reused many times, either via our staging approach or via the new tracing JIT compilers in the various tensor libraries, so that graph construction time wouldn't matter as much.

the amount of overlap with symbolic features and math abstractions that I'm already/will be handling in miniKanren

We've chosen to implement our current abstract math somewhat differently, taking a hybrid approach of doing larger abstract linear operations over multiple terms symbolically and hiding the details of individual linear or nonlinear operations. The result looks more like automatic differentiation than fully symbolic computation - for example, our exact Gaussian variable elimination code currently depends on black-box calls to pairwise Gaussian integrals rather than rewriting the full expression to the corresponding symbolic expression for the posterior.

The nice thing about this design pattern is that it allows us to use a lot of existing code designed with performance and numerical stability in mind, like the PyTorch Distributions library we use in Pyro or GPyTorch's highly optimized LazyTensor library for high-dimensional Gaussians, while preserving algebraic properties like closure under marginalization that are useful for working with larger probabilistic models. It would also be possible to implement fully symbolic and transparent versions of these pairwise/unary operations for symbolic integration.

However, it should also be possible to bypass all of that and implement something very similar to your KanrenRelationSub for Kanren-based term rewriting as a nonstandard interpretation of the term language, since expressions play nicely with unification. I'll try to do this soon and replicate a simple rewrite I found in your code (scale_loc_transform) to help myself understand the issues involved.

brandonwillard commented 5 years ago

The term language is semantically independent of any particular tensor backend, but to actually evaluate individual terms to data they need to be backed by an implementation. Currently we have support for PyTorch and less complete support for NumPy (which will also work with Jax).

Yeah, that's the purpose/role of the meta objects we use. We need Theano for PyMC3 and TensorFlow for the upcoming PyMC4.

Adding new backends is straightforward and independent of the rest of the codebase thanks to multipledispatch - as an example, here's the PR pyro-ppl/funsorI#58 where the NumPy backend was added by @neerajprad. We hadn't planned on adding support for Theano and Tensorflow in the near term because we didn't have a use case ourselves but we could certainly consider it.

Interesting, you're using multipledispatch, too? It's been a staple of this project since the beginning—as well as a few of my other projects (e.g. hypoKanren)—so I can understand how it fits into this, but that doesn't exactly address the real difficulties involved with creating and maintaining backends for tensor libraries. Things like generic functions are nice, but it's the choices in design and abstractions that sufficiently map meta and base objects (in a mostly one-to-one fashion) that matters.

That work requires some non-trivial knowledge of the underlying libraries and the domain abstractions. This is the kind of stuff we—and others doing similar work—need and could greatly benefit from collaboration.

Expression construction is currently pretty lightweight, certainly not more expensive than Theano. However, we make heavy use of multipledispatch and Python overhead is always an issue in this sort of library, especially since we haven't focused on performance yet. I would generally expect more complicated transformations to only be applied once and the transformed expression reused many times, either via our staging approach or via the new tracing JIT compilers in the various tensor libraries, so that graph construction time wouldn't matter as much.

The [S-]expressions I've been talking about—and that this library uses—are not Theano objects; they're simply tuples that represent S-expressions. They can be "eval'ed" (and/or reified) to produce Theano and/or TensorFlow objects.

We've chosen to implement our current abstract math somewhat differently, taking a hybrid approach of doing larger abstract linear operations over multiple terms symbolically and hiding the details of individual linear or nonlinear operations. The result looks more like automatic differentiation than fully symbolic computation - for example, our exact Gaussian variable elimination code currently depends on black-box calls to pairwise Gaussian integrals rather than rewriting the full expression to the corresponding symbolic expression for the posterior.

I'm afraid I don't understand what you're getting at here.

The nice thing about this design pattern is that it allows us to use a lot of existing code designed with performance and numerical stability in mind, like the PyTorch Distributions library we use in Pyro or GPyTorch's highly optimized LazyTensor library for high-dimensional Gaussians, while preserving algebraic properties like closure under marginalization that are useful for working with larger probabilistic models. It would also be possible to implement fully symbolic and transparent versions of these pairwise/unary operations for symbolic integration.

If you're implying that Funsor's expressions mix backend-specifics and low-level computational details with the mechanics of symbolic computation and the high-level mathematics it models, then that doesn't sound particularly good!

Rewriting expressions/terms is basically the heart of symbolic computing for mathematics (in this context, at least), so that type of work should be cheap, easy, generalizable and very well supported. Otherwise, if you're reasoning "symbolically" through class/type abstractions, their respective methods (or the directly related generic functions you mentioned earlier) and sequences of object/state mutations, etc., etc., then consider that the only fundamental difference might be the "platform"/machinery with which you're performing standard term rewriting operations and logic (e.g. Python's OOP model and implementation). There's nothing stopping one from using the same basic host language offerings in miniKanren. Actually, it's quite simple to do, even from within its non-standard features of greater term-rewriting relevance (e.g. relational reasoning and highly configurable, orchestrated unification).

However, it should also be possible to bypass all of that and implement something very similar to your KanrenRelationSub for Kanren-based term rewriting as a nonstandard interpretation of the term language, since expressions play nicely with unification. I'll try to do this soon and replicate a simple rewrite I found in your code (scale_loc_transform) to help myself understand the issues involved.

Speaking of unification, it looks like you're already adopting some of the fundamental design principles of this project—and very quickly at that! I imagine that you're learning miniKanren, too. After you do, you might want to look at https://github.com/pymc-devs/symbolic-pymc/issues/6 more closely.

The recommendations and examples—including the linked ones—in https://github.com/pymc-devs/symbolic-pymc/issues/6 are a working foundation for completely relational and less naive implementations of the graph rewriting needed for PP math. That thread also demonstrates how simple and straightforward this approach can be. The only reason this project isn't already teeming with more model-rewrite examples, algebraic relations, and so on, is that it's a side project written by a single person who's day-job isn't this. If that isn't a good example of how useful and appropriate miniKanren is for this type of work, I don't know what is!

eb8680 commented 5 years ago

Speaking of unification, it looks like you're already adopting some of the fundamental design principles of this project—and very quickly at that! I imagine that you're learning miniKanren, too. After you do, you might want to look at #6 more closely.

I only meant that I wanted to write a throwaway example that would help me understand your ideas better (since I haven't used Kanren for several years), and I apologize if I've offended you somehow. In case it's not clear, I'm very impressed with what you've done so far and your vision for where it's going, and it's given me a lot to think about.

Taking a step back, there are basically three components in each library:

  1. The declarative term graph language
  2. Declarative program transformations
  3. The interpreter or solver actually applying a transformation to a program

My current, possibly flawed understanding is that this project and Funsor have made different but ultimately equivalent design choices (in that term rewriting systems are equivalent to weighted automata) on 2 and 3, but it seems too early in both projects to say anything more useful than that.

However, it does seem like we basically agree about the design of the term graph languages, which are very similar as of #4, and there are a couple of smaller, more specific ideas that are perhaps worth discussing further.

One is representing tensors in the language as open terms by default rather than literals, with one free variable per dimension, as described in detail in the Funsor design doc. This makes it possible to nicely isolate a lot of the indexing/broadcasting API differences between backends, and to have backend-independent representations of tensor indexing as substitution, tensor broadcasting as propagation of index terms and linear operations like .sum, dot or mm as np.einsum-style unary or binary tensor contractions along subsets of named index variables.

The other is your use of unification.unify and unification.reify for term unification and substitution. Funsor currently has its own version of reify (the eager_subs methods of the various terms) that also applies some simple always-on optimizations, but perhaps it would make sense to factor those out and use reify for substitution in Funsor instead.

Maybe the first might help you maintain multiple backends more easily, as it has for us? Certainly if we adopted the second in Funsor our term language code would be simpler.

brandonwillard commented 5 years ago

Taking a step back, there are basically three components in each library:

  1. The declarative term graph language
  2. Declarative program transformations

This library satisfies both through its inherent use of relational/logic programming. The ways in which Funsor is declarative are not clear to me, yet—you'll have to point me in the right direction. As I read it now, the examples look like they're constructing models explicitly, then performing very specific symbolic optimizations at predetermined points/locations/steps.

In symbolic-pymc's case, it collects relations in a sort of mix-and-match "knowledge database" of generic patterns. Those collections of relations form relations themselves. For example, you can see here how I've sketched-out relations between common distributions and their stable-distribution forms. With that, one can simply state stable_dist(a, b) and when a is a distribution with a stable-distribution form, b is unified with that form, and when b is stable-distribution that's only partially specified (e.g. some parameters are unknown), then a is unified with all the possible common/non-stable distributions that it could be.

Likewise, the scale_loc_transform relation you referred to isn't a simple matter of pattern matching and replacement. Through the design principles I've been describing, scale_loc_transform is able to easily and naturally employ the necessary constraints (e.g. don't perform naive replacements on identity elements) and can be used for multiple purposes: under scale_loc_transform(a, b)

This type of truly declarative composition is a big part of how we're addressing the necessarily patchwork-like nature of the math used within applied statistics (and mathematics at large). As well, it plays an extremely important part in our desire to produce—in a directed fashion—numerous different and coherent graph re-formulations with better sampling properties, fewer costly operations, known (optimal) sampling methods, etc., etc.

In the bigger picture, we want the ability to move from naive model specifications to efficiently sampled ones and back. This functionality, especially when used in a bayesian setting, is how I believe we can arrive at truly expressive yet estimable models. By the way, I mean "expressive" in the sense of language (i.e. the math/symbols specifying the model), concept (i.e. the terms being modeled/estimated), and dynamics (i.e. the functional relationships between terms).

Otherwise, in this setting, one doesn't need to construct a specific model/graph and tell it exactly how and where the optimizations/simplifications/replacements should occur. Instead, it finds/fulfills these relations—constraints and all—in an extremely flexible way all throughout a graph. As a matter of fact, that "way" can also be expressed as a (declarative) relation!

What Funsor appears to be doing looks much more similar to SymPy and its Wild, Basic.match and Basic.replace. Actually, I wrote about using those a few years ago for similar purposes. At the end, I started to require constraints/conditional matching and better search/manipulation strategies; all things that cumulatively led to this project's choices

However, it does seem like we basically agree about the design of the term graph languages, which are very similar as of #4, and there are a couple of smaller, more specific ideas that are perhaps worth discussing further.

The use of a term graphs expressed through both language primitive OOP types/classes and S-expressions has been there since the initial commit (e.g. the latter and the former)—by necessity.