hakaru-dev / hakaru

A probabilistic programming language
BSD 3-Clause "New" or "Revised" License
311 stars 30 forks source link

Add Partition to Hakaru #85

Closed ccshan closed 7 years ago

ccshan commented 7 years ago

@JacquesCarette wrote (https://github.com/hakaru-dev/hakaru/issues/81#issuecomment-308192545):

Right now, the communication between Haskell and Maple goes through piecewise (and If?). We should definitely try to deprecate that, and communicate more directly through Partition. But that would require adding a Partition concept to hakaru; are you ok with that?

I'm ok with adding a Partition concept to Hakaru but I have two desires about it. First, it should mesh with the existing match construct (which the macro if expands to). Second, the other program transformations (such as disintegration) should be updated to work with it. So the first step seems to be designing a construct that harmonizes Partition with match. (By the way, match is written case on the Maple side.)

JacquesCarette commented 7 years ago

Semantically, I agree: match (even when spelled case) is a kind of partition. The biggest difference is that match knows it can rely on the syntax of values to be able to tell which case it is in, while Partition is based on "representations of predicates" [ x < y with x and y symbolic isn't really a predicate!].

The "problem" is that a Hakaru programmer may well write a model, using if, which isn't naturally a partition [because they use the linear semantics of if evaluation]. But since it goes through match anyways, maybe that's fine. Maple will have to 'unwind' this, and figure out what the 'disjoint' predicates ought to be.

What I don't know is how to translate things 'back'. Even though match is n-ary (right?), there is no pre-defined type in Hakaru with exactly n choices. That would be one way to go. Another way would be to somehow be able to tell apart a "pattern match" and a "partition match", because the mechanism of 'matching' is different.

yuriy0 commented 7 years ago

After more consideration, I'm not convinced this is necessary at this point. The original problem which led to the suggestion was actually due to an already-implemented simplification not being applied (i.e. a bug).

A Partition is very much a 'semantic' representation (i.e. "Partition is based on "representations of predicates"") and a piecewise is very much 'syntactic' (i.e. "it can rely on the syntax of values to be able to tell which case it is in"). Maple has a big toolbox for dealing with 'semantic' representations (basically, solve, the big hammer) and by extension (since syntax carries a semantics) has as much power when working with a syntactic representation. On the other hand, it is much easier dealing with syntactic forms in Haskell.

Turning a Partition into the 'smallest' piecewise is something towards which Maple is very amenable. If we wanted to get a more reliable, syntactic, and not-too-complicated equality check, we can try very hard to create such a minimal piecewise before sending back to Haskell (currently, we just create a piecewise).

Aside from that, I don't see what having a more "direct" form of communication would give us - if we decide on a consistent definition of 'smallest' (i.e., one which is a well-order) we can already have a 1-1 conversions from Partition to piecewise. The Partition->piecewise direction would potentially be very expensive, but we would only need to use it once per Partition, right before sending back to Haskell (it would essentially try precisely the current PartitionToPW all orderings of the Partition, and return whichever is best).

The biggest hurdle towards such a function is deciding exactly what is 'smallest'. The current understanding is the smallest piecewise has the smallest sum of condition sizes, and the current heuristic for condition sizes is quite rudimentary:

PartitionCond = '{relation, boolean, `::`, specfunc({`And`,`Or`,`Not`}), `and`, `or`, `not`}')
...
condition_complexity := proc(x) nops(indets(x,PartitionCond)) end proc;

I think if I had a better idea of the design goal I could speak more to potential implementations. Basically, the fact that I can't really tell what the design goal of something like Partition on the Haskell side would be (in Maple, it was twofold; to avoid uncontrollable 'piecewise' semantics, and to hit our Partitions with big Maple hammers without worrying about pesky ordering; neither seems to apply to Haskell), combined with the knowledge that this would be an entirely non-trivial feat of engineering, no matter how it's done, makes it an unattractive option all around.

JacquesCarette commented 7 years ago

I am fine with not doing this now (or ever).

I think it is still very important to avoid piecewise as much as possible. Whenever we (statically) know that the cases we have are disjoint (and cover the whole space), we should be using some kind of Partition (or match or case).

For example, it would be quite sad if simplify were not idempotent because of translating back-and-forth between Partition and piecewise.

yuriy0 commented 7 years ago

I think it is still very important to avoid piecewise as much as possible.

Unfortunately, large parts of the simplifier (e.g. eval_piecewise) still depend very much on using piecewise. When the simplifier reaches a point of maximum stability would probably be the most opportune moment to look at that.

For example, it would be quite sad if simplify were not idempotent because of translating back-and-forth between Partition and piecewise.

Indeed, and this is almost certainly something for which there should be Partition unit tests, and Maple unit tests. Some of the NewSLO tests would act to ensure this is the case (i.e. when the input and expected result are the same).

But this issue is probably something that should be dealt with on the Maple side - we need that property in Maple as well as during the parsing stage from Maple to Haskell. Without it, we can't rely on conversions between Partition and piecewise when they do happen to be needed. This is especially relevant since we sometimes recurse (with reduce) if expr_new <> expr_old; a lack of idempotence could lead to an infinite loop in those cases.

I believe that PWToPartition @ PartitionToPW and PartitionToPW @ PWToPartition are the identity. They should be, and if these are the identity, they must be idempotent. Perhaps testing the latter (piecewise -> piecewise) is not as important as testing the former (Partition -> Partition).

However, (x->PWToPartition(x,do_solve)) @ PartitionToPW and PartitionToPW @ (x->PWToPartition(x,do_solve)) are almost certainly not, and I wouldn't expect them to be; but they should be idempotent.

JacquesCarette commented 7 years ago

Right, calling do_solve can, and should, do rather non-trivial work. I will close this issue as something we will not in fact do.