Free energy computation around deterministic nodes

wmkouw commented 5 years ago

Currently, free energy is computed by the edges who query the connected nodes. However, deterministic nodes have no free energy.

This becomes a problem in equality nodes. Suppose we have 3 edges x,y and z connected via and equality node. They each are also connected to other nodes, f,g and h respectively.

[f] -----x-----[ = ]-----y-------[g]
                 |
                 z
                 |
                [h]

To get the free energy for edge y, we would need to query the energies of f and h. Edge y cannot access these nodes, because the equality is in between. We need some way of passing energies through equality / deterministic nodes or some way for edges to reach beyond just their connected nodes.

Edit by Joris if you add three backticks around a graph it will be rendered as-is. This is typically just used for code, but helps in this case as well.

bauglir commented 5 years ago

I probably don't know enough about factor graphs and free energy, but... Hear me out.

I have been thinking about the equality node from an implementation point of view lately. More precisely, the implementation as I have it in mind. Now in that view, I actually don't think you need an equality node and your problem may go away completely.

In my understanding, the only reason to have an equality node is to 'copy' messages, so they are independent for the different attached nodes. In an observable reactive paradigm, I think you could capture this in the concept of subscriptions. This would mean that in the implementation the x and y messages would actually collapse into a single 'observable' that'd have 2 subscriptions to it in the form of f, g and z. Depending on how you set this up (there are different ways in, for example, the ReactiveX world), these 'subscriptions' would be independent (if that's what you want). I'm not sure if that helps, but I'm thinking it may as you now no longer have a deterministic node, but can calculate the free energy based on the subscriptions.

Also, these types of graphs are why you want things like PlantUML or something similar integration in your issue tracking :D

Edit added z in my description. I was thrown off track by it being rendered attached to the [f] node instead of the [=] node.

wmkouw commented 4 years ago

I would much prefer to get rid of the equality node. It feels like a nuisance that arises from the desire to stick to Forney-style graphs. I actually think that this problem disappears if you work with standard factor graphs (https://en.wikipedia.org/wiki/Factor_graph).

Yes. It is my understanding that it serves to copy messages as well. I'm glad to hear that this is possible with reactive programming. I'll give it a try! I had planned to start working on an actual reactive programming implementation the week after I'm done moving (next week). This week I've still got some other stuff that I need to get done.

What, you don't like my hand-crafted, artisanal, ASCII art? ;) No, seriously, thanks. I'll look into PlantUML.

Thanks for the edit as well! I wrote the issue on shitty airport wifi and couldn't get back to fix it.

bertdv commented 4 years ago

Equality nodes don't copy messages. Equality nodes are branching nodes in generative direction and fusing nodes (by bayes rule) in "recognition" direction. Let's talk:)

wmkouw commented 4 years ago

This is my understanding of the equality node:

For an equality node with 2 edges,

x ----[=]---- z

the factor node function is f₌(x,z) = δ(x-z) and the outgoing messages are: 1) ν₌(z) = ∫ q_x(x) log f₌(x,z) dx = q_x(z) , 2) ν₌(x) = ∫ q_z(z) log f₌(x,z) dz = q_z(x) .

So, the incoming message q_x(x) becomes the outgoing message q_x(z). Since q_x indicates that the same distribution is used, I would call the outgoing message a copy of the incoming message.

For an equality node with 3 edges,

x ----[=]---- z
       |
       y

the factor node function is f₌(x,y,z) = δ(x-z)δ(x-y) and the outgoing messages are: 1) ν₌(z) = ∫ q_x(x) q_y(y) log f₌(x,y,z) dy dx = q_x(z) q_y(z) , 2) ν₌(y) = ∫ q_x(x) q_z(z) log f₌(x,y,z) dz dx = q_x(y) q_z(y) , 3) ν₌(x) = ∫ q_y(y) q_z(z) log f₌(x,y,z) dy dz = q_y(x) q_z(x) .

In other words, the outgoing message is a product of the copies of the incoming messages.

So, what am I missing? Why would it be inappropriate to call this copying? And what do you mean exactly with "branching in the generative direction" and "fusing in the recognition direction"?

bertdv commented 4 years ago

In short, passing on the product of two messages is not copying but rather a very specific update rule. Longer version below.

First, the proper update rules are just sum-product rules rather than variational update rules, see bullet 4 p.6 in long paper by Dauwels on VMP.

Regarding the 3-edge equality node. Let's assume the equality node is positioned as we usually do in state-space models, ie, the forward message on x carries a prior, the upward message on y carries a likelihood msg and the forward msg on z carries a posterior.

Assume the forward message for x is ready and the upward msg on y is 1 and the backward msg on z is also 1. This is the "generative (or "predictive") direction", since there is no likelihood msg and no backward msg from the future. Applying the update rules will lead to copying the forward msg of x onto both the forward msg on y and on the forward msg on z, iow, the msg x gets "branched" (copied).

Now assume that y also carries a backward (likelihood) msg, then the message onto z is the product of the forward msg on x and the backward msg on y, ie, we execute bayes rule (prior times likelihood). This is not simply copying. Passing on the product of two messages is not the same as copying. (For instance, why not pass on the sum of two incoming messages).

wmkouw / schedulefree-mp

Free energy computation around deterministic nodes #7