probcomp / Venturecxx

Primary implementation of the Venture probabilistic programming system
http://probcomp.csail.mit.edu/venture/
GNU General Public License v3.0
28 stars 6 forks source link

The "second package" bug #447

Open axch opened 8 years ago

axch commented 8 years ago

Venture has somewhat kooky handling of the creation of SPs. To wit, when applying a (P)SP A returns to Venture an SP object B (which actually has to be returned as a VentureSPRecord object), Venture doesn't just put it in its node, but intercepts it and does some stuff (processMadeSP in regen.py). To wit, Venture stores the VentureSPRecord in a trace-local table indexed by node, and stores in the node an SPRef object. Procedure application then follows the node pointer in the SPRef object and fetches the VentureSPRecord object.

Why do it this way? So that the SPRef objects do not change in the event that A is resampled and produces a different VentureSPRecord. This way, changing an SP only triggers recomputation at the point where that SP was used, and not in any intervening path of variables or data structures that may have been carrying it around.

What is the problem with this? Consider trying to write a primitive (P)SP (such as gpmem) that needs to return two SPs to its caller. The natural thing to do is to return a list of two VentureSPRecord objects, inviting the following usage pattern:

assume package = gpmem(...)
assume f_compute = first(package)
assume f_emu = second(package)

What actually happens in this case? The call to gpmem will execute, but will not register as creating an SP and will not trigger processMadeSP. Then each of first and second will separately find themselves returning VentureSPRecords (instead of the SPRefs they would have returned if it were just a list of procedures created otherwise), and processMadeSP will be triggered at f_compute and f_emu. This is now somewhat unfortunate, because the resulting SPRef objects will point to the nodes where the accessors were applied rather than to the original application of gpmem as their sources. However, this in itself does not seem to be fatal.

The real trouble comes when gpmem declares that its children absorb at applications. That instructs Venture, when resampling a call to it, to update the result (the list of VentureSPRecords, in this case) but not to propagate those downstream, on the logic that said returned value is an SP that already has the sufficient statistics of all of its calls (that's what AAA means). This is OK if the returned value was actually an SP, but the list becomes stale and causes problems.

I am embarrassed to say that I don't remember the mechanism of the problems. The manifestation was that covariance kernel parameter changes were not being propagated to some uses of the GP through gpmem, and for some reason inlining second(package) instead of referring to the variable f_emu fixed it. Hence the name of the bug. If memory serves, the likelihood effects of observations were being propagated to inference on the covariance correctly.

Could it be that the issue was that new samples were being drawn from the original GP covariance, rather than one affected by inference, because f_emu had become stale, but writing second(package) in the sample expression fixed it? I think that must have been it, because I don't see how the second(package) expression would be updated either, except by being force-reevaluated by virtue of being directly sampled.

Possible solutions:

axch commented 8 years ago

Fixing this does not constitute, in itself, an implementation of multiple value returns for primitives (#314), but seems to make it much easier (see comment on that issue).