wolfe-pack / wolfe

Wolfe Language and Engine
https://wolfe-pack.github.io/wolfe
Apache License 2.0
135 stars 17 forks source link

Belief Propagation is incorrect when there are multiple solutions #81

Closed insperatum closed 10 years ago

insperatum commented 10 years ago

Example:

case class BoolPair(x:Boolean, y:Boolean)
def model(b:BoolPair) = I(b.x ^ b.y)
val best = argmax(all(BoolPair)) (model)
// best: BoolPair = BoolPair(false,false)

After belief propagation, setToArgmax is called for each variable independently. It will choose the first possible setting consistent with some global argmax, but not necessarily one consistent with the choices made for other variables.

sameersingh commented 10 years ago

Good catch! To address this in general, you have to do another round of message passing where each message is "deterministic", i.e. only has the argmax-ed value in it.

riedelcastro commented 10 years ago

Does this work for loopy graphs as well?

Generally I would make this "clever assignment" optional. During training this state isn't used (just the gradient which doesn't have this problem I think), and after training ties are quite unlikely.

sameersingh commented 10 years ago

Yes, works for loopy graphs too. It will give an approximate solution of course, but it won't be a weird mix of two different solutions.

I think it is used during training, isn't the update difference of gold features and the features of the argmax state? If the argmax state is not one of the MAP states, but a mixture, it might cause weird problems. But yes, ties after training are quite unlikely.

It is also possible to implement this in a manner that is not less efficient, e.g. during Viterbi you can have the backward pass consist only of deterministic messages, but this is at the cost of not being able to compute max-marginals for non-MAP values.

On Mon, Jul 7, 2014 at 3:52 PM, Sebastian Riedel notifications@github.com wrote:

Does this work for loopy graphs as well?

Generally I would make this "clever assignment" optional. During training this state isn't used (just the gradient which doesn't have this problem I think), and after training ties are quite unlikely.

— Reply to this email directly or view it on GitHub https://github.com/wolfe-pack/wolfe/issues/81#issuecomment-48251808.

Sameer Singh

riedelcastro commented 10 years ago

No, we use a subgradient that is a linear combination of the guess features of the possible guess solutions. I think this is a legal subgradient as well, and it's more stable. It also means there is no randomness in the updates of the first iterations (otherwise these depend on your seed). Finally, this gradient is consistent with the MAP case as a zero-temperature limit of the marginal case. That's important also because it may be more correct to think of perceptron training as approximating the logZ objective gradient with a MAP state gradient. This is based on Luke's concern about the current perceptron loss always having the solution 0, and in the non-separable only having the solution 0.

insperatum commented 10 years ago

Sameer, can you clarify this a little for me, please? I've thought about it for a while and I just want to check that I'm on the right track.

Plan: Warm start another round of sort-of-maxproduct with the old messages, but with the difference that:

Is this what you meant? And it works on loopy graphs?

sameersingh commented 10 years ago

Yes, that's what I mean, and it will work on loopy graphs (depending on what you mean by "working on loopy graphs"). It should give one or the other MAP state, according to the messages, instead of one partial.

Note that it just needs the "last" round of message passing to be this way, so for chains you're still doing only one pass forward, and one pass backward, and just in the backward pass you carry single values instead of full messages.

Also, this assumes you only need the max state, and not max-marginal scores.