Open axch opened 7 years ago
Candidate plan: Implement Option 2, and then fix #470 by making Gibbs detach first.
Sub-issue: As described, this plan will arguably make #462 worse, in that it will cause to entirely miss some configurations (whereas now it merely overcounts some). One way to mitigate that problem is to scan the whole per-application cache and include all the atoms therein as options. This will pretty much maintain the behavior that #462 currently exhibits. Doing this creates concerns about whether the cache is stale (e.g., by applications disappearing into the brush), but since this is a band-aid over not fixing #462 yet, we can choose to leave it be.
Hmm. I'm not sure a weak table will suffice to fix the distribution of the posterior even for the single-site case. In particular, space usage aside, a semantically correct implementation of a weak table (e.g., Python's WeakKeyDictionary) need not ever remove any entries. In that case, spurious additional possibilities may accumulate in the image of this weak table as applications are discarded from consideration by inference but remain in the weak table.
Which, space usage aside, is only a problem while we are trying to traverse the keys in order to make multi-site Gibbs fail conservatively. When #462 is fixed, we can change the behavior of enumerateValues
to only query its own application, and then it won't matter whether there are spurious entries in the table or not.
Right, I forgot about that.
Should be implemented in Lite by #647.
Consider what happens if the last CRP application that was assigned to a particular atom is detached and later regenerated, or its support enumerated. For example,
[Right now, due to #470, the single-site gibbs will consider 1, 2, 3, which is completely wrong. But suppose for the sake of argument that it detached first and then asked the CRP to enumerate candidate values.]
Under the heading "propose a fresh cluster", there are actually three distinct things that the CRP implementation can do:
Why does this matter? One might think that this would lead to different behavior if the atoms are used downstream as mem keys, but it shouldn't, because detach will eliminate an unreferenced mem application. However, the choice between 3 or either of 1 or 2 will affect whether the newValues==currentValues check in enumerative Gibbs triggers. If it incorrectly fails to trigger, it will not be executing Algorithm 8 (Neal 2000), and potentially have the wrong stationary distribution.
What is the status quo? Right now the CRP in Puma does (1) and the CRP in Lite does any one of the three arbitrarily, depending on details of the past history of incorporations and unincorporations.
How should we handle this? The essence of the problem is that CRP expects to operate under the invariant that it is free to permute the atom numbers at will, and nothing downstream will care. This is successfully true within a single fully-instantiated trace (as long as atoms from distinct CRPs are never mixed, which is a separate problem), but the newValues==currentValues check expects any such permutation to be consistent across the pre-proposal and post-proposal traces as well. In addition, if we allow funny games with partially-regenerated traces (such as in particle gibbs), there can be a problem with permuting the atoms in a (partially regenerated) trace but not in the corresponding OmegaDB.
Option 1: Write client code not to depend on atom identities. I am not quite sure how to do that.
Option 2: Make CRP retain a map of application ids that have been unincorporated to the atoms they had been assigned to. Then, when re-proposing an existing application, we can scan that map to ensure the desired behavior (namely, ensuring 3 and avoiding 2).
Option 3: Obtain the same effect by giving the CRP access to the entries in the OmegaDB at all its call sites
Note: To avoid leaking, this map needs to be weak in application ids. I don't know whether it also needs to be proactively cleaned whenever a transition completes, because I'm not sure whether there would be any harm in a binding still being present even though that application's removal from the trace had been accepted in the past.