Open axch opened 8 years ago
The operation we actually want for dirichlet multinomial accepts two such trees of total size O(n), (the counts and the pseudo-counts) with potentially repeated elements, and samples an element proportional to its total weight in both trees, in O(log n).
Why not just sample one of the two trees proportional to their weight, then sample within the chosen tree in the straightforward way?
Oh. Duh. That's even better.
Our dirichlet-categorical woes, namely #407, #451, and #452, are ultimately due to representing the probabilities in terms of the indexes of the options rather than in terms of their actual values. Changing the representation to values is slightly tricky, but doable.
The plan:
SPAux
for dirichlet categorical to store a map from the VentureValues that were actually returned to the counts of how many times.self.index
with constructing a tree for the pseudo-counts of the given objects. This tree also represents the set of options that are possible. Don't forget to sum if equal objects appear multiple times (fixing #407).simulate
calls the "sample proportional to total weight" operation.logDensity
finds the value's total weight by looking the value up in the aux and the pseudo-count tree.incorporate
andunincorporate
invoke "increment, inserting if absent" and "decrement, deleting if zero", respectively.enumerateValues
should not produce duplicates.logDensityOfData
should apply the same formula to traversing both trees.float('-inf')
(I think that's what the formula would do anyway if one of the alpha terms is zero).gradientOfLogDensityOfData
accordingly.Doing all this for Puma is #340.
[Edited: Replaced "multinomial" with "categorical" in names, pursuant to #450.]