peterwittek / qml-rg

Quantum Machine Learning Reading Group @ ICFO
GNU General Public License v3.0
131 stars 63 forks source link

Training Boltzmann machines in Wiebe et. al #14

Open apozas opened 7 years ago

apozas commented 7 years ago

In appendix E, section 5 of Quantum Deep Learning, the authors write:

[...] our quantum algorithms [...] can therefore efficiently train full Boltzmann machines given that the mean-field approximation to the Gibbs state has only polynomially small overlap with the true Gibbs state.

However, by reading the rest of the article one is lead to the idea that using mean-field theory is a good enough state (better than a uniform state) to be used as a prior in the network.

Does this phrase therefore have a typo or am I missing something?

peterwittek commented 7 years ago

Yes, this looks like a typo. On top of p.12 (v2 of the manuscript) and in appendix E.3, they argue that the mean-field approximation prepares the state with a large overlap with the true Gibbs state.

peterwittek commented 7 years ago

This might not be so simple after all. We are trying to understand the conditions in which a mean-field approximation is efficient and it is not quite clear. In E.5 they say that κ is larger for the fully connected case, that is, the overlap is smaller with the true Gibbs state.

apozas commented 7 years ago

This appendix E.5 is quite interesting indeed, so let me show how I understand what is going on with a couple of bullet points:

Our purpose in showing results for the fully connected case, then, is to demonstrate how approximations based on sparse graphs, such as those of this paper, may behave poorly for very densely connected problems

(Nevertheless, I am not entirely sure if the quote actually says what we are interested in).

The fact that the mean-field approximation does not work well in the case of fully-connected machines --kind of-- also makes sense from a physical point of view. If we talk about Ising models, a fully-connected network represents a system in which all other nodes are nearest neighbors of a given one. Mean-field theory was devised to study systems in which you have just a few nearest neighbors, like those that you find in physical cristaline lattices as in magnetic materials, or in this case, in restricted Boltzmann machines. With this in mind, it makes sense that mean-field priors are closer to solutions of RBMs than to solutions of BMs, which is what they see.

Therefore, it is possible that there is no typo and what they say in the appendix is correct: given that the mean-field approximation is not a good approximation in fully-connected networks (BMs), it may be cleverer to use priors other than mean-field states.

peterwittek commented 7 years ago

@cgogolin, you read this paper too, do you have any insight on the relationship between the graph topology and the mean-field approximation? I remember you dismissed using this approximation in the local Hamiltonian of the QMLN paper.

cgogolin commented 7 years ago

I don't remember the details, so what I am going to say might be total bullshit, but:

I think what they meant to say with this sentence is that it is enough for efficient training if the mean field approximation has an overlap that is not smaller than polynomially small.

apozas commented 7 years ago

Don't take me wrong, I'd be really happy if this was just a semantical problem and it was indeed the case that the greater the overlap with a Gibbs state the better you perform, but there are really a lot of better ways of saying what @cgogolin is saying. Coming back to the phrase:

[...] given that the mean-field approximation to the Gibbs state has (only) polynomially small overlap with the true Gibbs state.

I would totally believe that they are defending what Christian is saying if instead of the expression in parentheses they used something like (no less than), (at least), (an overlap greater than)...

Do you think this question is relevant enough to email the authors?

peterwittek commented 7 years ago

I already tried emailing them with no success.

peterwittek commented 7 years ago

There is a lot more on this topic in a follow-up paper, which in turn also cites Chowdhury and Somma, 2016, and other efficient general protocols to prepare a thermal state.