probcomp / hierarchical-irm

Probabilistic structure discovery for rich relational systems
Apache License 2.0
1 stars 1 forks source link

Should Distribution::incorporate() take a weight? #59

Open ThomasColthurst opened 2 weeks ago

ThomasColthurst commented 2 weeks ago

(This is a discussion issue rather than an action issue! If I've assigned this issue to you, it is because I want your opinion.)

The weight would have a default of 1.0, so that all present users wouldn't have to change anything.

But this would be useful in at least three different ways:

1) Sometimes we get fractional evidence that a data point belongs to a distribution, and this lets us update the distribution on that. For example, in the SimpleStringEmission class, we track the probabilities of insertions, deletions and substitutions via BetaBernoulli models. But we can only guess at those given our actual inputs, which are clean and dirty strings -- is (clean="hello", dirty="hellp") a substitution or an insertion plus a deletion? The "right" thing to do would be to assign probabilities to both of those possibilities, and to call the corresponding BetaBernoulli with fractional weights.

2) Sometimes we get bulk evidence, and it is slightly inefficient to have to call .incorporate() in for loop rather than with an integer weight > 1. To use SimpleStringEmission as an example again, (clean="hello", dirty="") is evidence of five deletions, and it would be nice to just call deletion.incorporate(true, 5.0);

3) We could implement .unincorporate(x) in base.hh::Distribution as incorporate(x, -1.0) and then child Distributions wouldn't have to worry about it.

The biggest downside is

4) It might be hard or impossible to implement this for some distributions. I don't think this is true for any of existing distributions, or any others that I can think of off the top of my head, but please correct me if I'm wrong!