netket / netket

Machine learning algorithms for many-body quantum systems
https://www.netket.org
Apache License 2.0
547 stars 188 forks source link

Taking Quantum Geometric Tensor seriously #873

Open KaelanDt opened 3 years ago

KaelanDt commented 3 years ago

Hi,

I noticed that when using an ARNN machine with complex parameters, computing the quantum geometric tensor using netket.optimizer.qgt.QGTOnTheFly() would yield a non-Hermitian matrix. Here is a minimal working example:

import netket as nk
import numpy as np
hi = nk.hilbert.Spin(s=1 / 2, N=8)
ma = nk.models.ARNNConv1D(hilbert=hi, layers=2, features=5, kernel_size = 3, dtype = complex)
sa = nk.sampler.ARDirectSampler(hilbert=hi)
vs = nk.vqs.MCState(sa, ma, n_samples=100)
q = vs.quantum_geometric_tensor(nk.optimizer.qgt.QGTOnTheFly())
q_dense = q.to_dense()
a = np.linalg.eigvals(q_dense)

This outputs complex eigenvalues (that can be quite large), which is not the case if we replace the 3rd line starting from the end with q = vs.quantum_geometric_tensor(nk.optimizer.qgt.QGTJacobianDense())or q = vs.quantum_geometric_tensor(nk.optimizer.qgt.QGTJacobianPyTree()). Using an RBM with complex parameters does not lead to the same issue, as well as an ARNN with real parameters, so maybe it is due to jax badly handling complex parameters?

gcarleo commented 3 years ago

It's worth checking if The ARNN you are using is holomorphic, if it is not then the QGT is not well defined and different methods might give some inconsistent results

On Thu, Aug 19, 2021, 20:22 KaelanDt @.***> wrote:

Hi,

I noticed that when using an ARNN machine with complex parameters, computing the quantum geometric tensor using netket.optimizer.qgt.QGTOnTheFly() would yield a non-Hermitian matrix. Here is a minimal working example:

import netket as nk import numpy as np hi = nk.hilbert.Spin(s=1 / 2, N=8) ma = nk.models.ARNNConv1D(hilbert=hi, layers=2, features=5, kernel_size = 3, dtype = complex) sa = nk.sampler.ARDirectSampler(hilbert=hi) vs = nk.vqs.MCState(sa, ma, n_samples=100) q = vs.quantum_geometric_tensor(nk.optimizer.qgt.QGTOnTheFly()) q_dense = q.to_dense() a = np.linalg.eigvals(q_dense)

This outputs complex eigenvalues (that can be quite large), which is not the case if we replace the 3rd line starting from the end with q = vs.quantum_geometric_tensor(nk.optimizer.qgt.QGTJacobianDense())or q = vs.quantum_geometric_tensor(nk.optimizer.qgt.QGTJacobianPyTree()). Using an RBM with complex parameters does not lead to the same issue, as well as an ARNN with real parameters, so maybe it is due to jax badly handling complex parameters?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/netket/netket/issues/873, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGWYRBFALEBBB5IX2DBEGJ3T5VDXDANCNFSM5COYOIKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

PhilipVinc commented 3 years ago

Ciao Kaelan, nice to se you here

The output of QGTJacobian is by construction PD, but it might be wrong in case your model is not holomorpgic.

If your model is not holomorphic, those techniques have ways to split real and imaginary part so that the QGT can still be applied to the vector (though the matrix representation will make little sense). Check the docs for the relevant options to do that.

Instead, for QGTOnTheFly (cc @inailuig ) I think that the QGT should be correct regardless, but it has never been checked. Maybe it's a good time to check. Jax.vjp is ok with non holo, so we should only check what happens with jax.jvp.

-

Regardless, arnn should be holomorpgic if the activation function is. The default is selu, so check if that one is holo.

Also what version are you on? And does this also happen on master?

KaelanDt commented 3 years ago

Thanks! Indeed selu is not holomorphic at 0...

It works fine with QGTJacobianDenseand QGTJacobianPyTree since it defaults to holomorphic=false.

I was using the latest version, 3.0b4.

Then my question is: do you know if there are any normalizing activation functions that are also holomorphic?

PhilipVinc commented 3 years ago

Thanks! Indeed selu is not holomorphic at 0...

If it is not holomorphic at one point it should not matter in general: the chances the input is exactly zero should be vanishingly small, unless you're really at this pathological point.

It works fine with QGTJacobianDenseand QGTJacobianPyTree since it defaults to holomorphic=false.

Yes, though you should keep in mind that the dense representation in this case (and therefore the eigenvalues too) are not really directly linked to the real QGT.

For example, if you have N parameters, with a C->C ansatz and holomoprhic=False, the QGT is computed on the $Real[C]->C$ and $Imag[C]->C$ parts of the ansatz (if I'm not mistaken), and then combined again. In the end, the QGT you get is a 2Nx2N real-valued matrix because it's splitting the real and complex.

This way of computing it is correct as long as you are doing QGT@vector which is what we need for SR. However, if you want to look at the QGT itself you need a lot of care to unpack it and interpret it.

(@femtobit , @attila-i-szabo , maybe it's worth to warn or fail to give the full QGT, or return the two parts, in this case?).

gcarleo commented 3 years ago

For non holomorphic complex functions the QGT of size N*N is just not well defined, I think that if te QGT method of the variational state is called in that case we should always return an error, not the doubled QGT which is another object

PhilipVinc commented 3 years ago

Our QGT is a lazy object primarily computing the application to a vector. I believe that the application is well defined even in the non holonorphic case. We cannot make the construction of the QGT lazy object fail otherwise we would not be able to use sr.

the issue stems from calling the to_dense function . That operation is not well defined and this one should either fail, warn or return a different object .

by the way, this is also relevant for QGT on holonorphic functions that splits real and imaginary parts. I’m that case our dense matrix representation is not the QGT itself.

gcarleo commented 3 years ago

I believe that the application is well defined even in the non holonorphic case.

I am not sure though this is a well defined operation, the well-defined thing is unpacking the matrix into a 2N2N matrix and apply it to a vector of size 2N, getting out a (complex) vector of size 2N. If I understand this case as it is implemented, now we can apply the QGT to a complex vector of size N and get out another complex vector of size N. Aren't we losing half of the entries? (Maybe it's well define when doing SR, as a gradient preconditioner, but in general I suspect that the matrixvector product will not be correct anyways!)

attila-i-szabo commented 3 years ago

@PhilipVinc is right, the "matrix"-vector products returned by QGTOnTheFly are correct for everything (the JAX autodiff docs explain that JVP and VJP effectively splits every complex number into real and imag parts, so it is always correct). Of course, to_dense returns an NxN complex matrix, which is too little information unless the function is holomorphic.

unpacking the matrix into a 2N×2N matrix and apply it to a vector of size 2N, getting out a (complex) vector of size 2N

The "unpacking" turns both the matrix and the vector into real ones, so the 2N numbers you get at the end are also real. That's how JVP and VJP effectively work

gcarleo commented 3 years ago

sorry I am confused: if the application to an arbitrary vector is exact, how come the matrix elements are wrong? (I can imagine to do <V'|M|V> for V and V' basis vectors and get the matrix element)

attila-i-szabo commented 3 years ago

Yeah, but the right basis for that would be (1,0,0,...), (i,0,0,...), etc., unless the function is holomorphic (which enforces some equalities between the matrix elements you get this way)

gcarleo commented 3 years ago

if I apply it to (1,0,...0) or to (i,0,...), in the second case I expect the result to be i*the_first_output, otherwise the operator is not even linear?

gcarleo commented 3 years ago

I think the point here is that the matrix*vector operation is not well defined (at least not in the usual sense of a linear operator acting on a vector space of size N). It is well defined only in the sense that if you apply it to a vector with complex entries, it interprets it effectively as a real vector living in a larger (2N) space and then applies the operator in this enlarged space + re-packs back to N?

PhilipVinc commented 3 years ago

Yes, that’s exactly what happens, which I believe is well defined.

the point is that in this case the Lazy QGT object does not correspond to a matrix anymore, but to something different. Effectively it corresponds to 4 matrices.

when we call to_dense we apply it to all basis vectors of R^N (or R^2N when splitting real and imaginary) and we compose the resulting matrix.

this is plain out wrong for non holonorphic.

gcarleo commented 3 years ago

great, do we also agree that the matrix vector product in the non-holomorphic case then is not a standard matrix vector product ? :) (again, the case given above is clear: M V=b, if V=(1,0...) but if V'=(i,0...) b'\neq i * b) If so, I think that the notion of having a (regular) linear operator in the non-holomorphic case is incorrect and we should not return a linear operator but an error !

gcarleo commented 3 years ago

you copied the same comment ?

It seems to me that the conclusion is that in the non-holo case the object returned is not a linear operator, since what the matrix* vector operation does is not what a linear operator does, so in this sense we should always error when trying to convert this Lazy QGT object to a linear operator... (even when trying to multiply it for a vector, not only when calling to_dense)

gcarleo commented 3 years ago

if you want, I would just not support non-holomorphic complex parameters, since it is just too contrived

PhilipVinc commented 3 years ago

yes sorry, I don’t know what happened…

I do agree that this is not a matrix vector product in C^N anymore. It’s a matrix vector product in R^2N.

So non holomorphic QGT is not a linear operator in C, but in R2.

So we could argue ‘we cannot call it linear operato’ I’m this case. But (naming aside) it works correctly so why should we prevent users from building it?

I think the only tricky part of the api is when we convert it to a matrix. The to_dense representation however returns a 2Nx2N matrix, which is correct if we think of QGT as a linear operator in this space…

I don’t understand what part of the api you think should error. (And in general I prefer having an api that always works and might return different objects )

PhilipVinc commented 3 years ago

if you want, I would just not support non-holomorphic complex parameters, since it is just too contrived

I am against this. We do support this case in SR through the current API. I don’t see why we should break something that works well.

also, we have no way to know if a model is holomorphic or not. Right now, we assume that C->C models are not and always split real and imaginary parts. This behavior was discussed in a previous PR and chosen because it’s always correct as far as SR is concerned.

, so in this sense we should always error when trying to convert this Lazy QGT object to a linear operator...

LazyQGT are all subclasses of LinearOperator. In a sense, here, linearoperator only means that you can apply it to a vector. Again I stress that this operation is always well defined.

The conversion to a matrix is less well defined. Though returning a 2Nx2N real matrix obliges the user to split the complex vector into a 2N real vector, so it prevents errors.

I think the issue stems from the fact that only QGTJacobian does this conversion to a 2Nx2N matrix when converting to dense. QGTOnTheFly does not and returns a complex matrix.

I think that the issue only exists for QGTOnTheFly. But we do not currently have a way to tell this implementation if the ansatz is holo or not, because it does not need to know it when applying it to a vector…

gcarleo commented 3 years ago

linearoperator only means that you can apply it to a vector

I think this is the main source of confusion. To me, linear operator, means linear operator, otherwise according to your definition anything that can be applied to a vector is a linear operator?

I don’t understand what part of the api you think should error.

I think that we all agree that what you are doing is well defined in the context of preconditioners, i.e. when minimizing a real-valued quantity. I would not remove that functionality (even if this is a contrived case), since as you say it is working and it is documented in the APIs. However, the problematic aspect of the APIs is that the quantum geometric tensor in this case should error somehow when called by the user, since what is returned is not the quantum geometric tensor, and since the quantum geometric tensor (again, as a linear operator object <Partial_k Psi| Partial_k'Psi>) is not well defined if Psi is not holomorphic. So ideally, this thing should work only as a preconditioner but not as a stand-alone linear operator (which I understand is trick to differentiate... but you see that there are inconsinstencies, so this is a true problem!)

gcarleo commented 3 years ago

I think the issue stems from the fact that only QGTJacobian does this conversion to a 2Nx2N matrix when converting to dense. QGTOnTheFly does not and returns a complex matrix.

yes, I agree! if you want, the conversion to linear operator should happen in a space where the operator is actually linear, so in this case 2N (or 4N?). If we do that consistently then it would be ok-ish, I guess, to still allow to call the quantum_geometric_tensor method of the variational state even for non-holo

attila-i-szabo commented 3 years ago

It might be good to add a holomorphic switch to QGTOnTheFly.to_dense() or wherever it fits to control whether complex numbers should be split. Other than obvious performance issues, it might be more helpful to someone wanting to use the QGT for something directly to have the complex entries directly if they make sense. And this information needs to be specified externally, because we can't (shouldn't) do enough introspection to tell whether a network is holomorphic.

gcarleo commented 3 years ago

@attila-i-szabo yes that makes sense but I'm still confused sorry... Maybe you can help me understand :) Now I'm thinking if it is actually safe to use iterative solvers (say, conjugate gradient) with these QGT objects that are nominally LinearOperators but actually are NonLinear operators? I'm just trying to understand what happens at the solver level (I.e when I call the method solve of the conjugate gradient using the QGT object). I am worried that the conjugate gradient will just use the matvec product method, and if that does not verify the properties of a linear operators I'm not sure the solver will do anything meaningful

PhilipVinc commented 3 years ago

It might be good to add a holomorphic switch to QGTOnTheFly.to_dense() or wherever it fits to control whether complex numbers should be split.

Yes, I think so too, though a good interface should be thought of.

Now I'm thinking if it is actually safe to use iterative solvers

It is through the QGT.solve method because internally what it does is split the real and imaginary part of the C^N vector into a R^2N vector, and call solve on this one.

In this space the operator is linear.

--

Also, unrelated, but the non-holo QGT operator respects the property

S@x + S@y = S@(x+y)
S@(ax) = a(S@x)   ∀ a∈R

So the operator is linear as long as you only multiply it by a real constant, which I believe is what will happen in most linear solvers. So it will probably be fine to use it in most linear solves, too, without the need of calling the internal solve method.

gcarleo commented 3 years ago

Ok I see, so it's really important to use the internal solve method and not the solve method of the scipy solver.

My suggestion would be to try to remove this ambiguity and make the matvec method work already in the enlarged space (and in the enlarged space only, since that's the only space in which the QGT is a LinearOperator). This would likely allow also to remove the internal solve method (there might be other issues though...)

PhilipVinc commented 3 years ago

By the way, I don't want to mess up the discussion, but:

every issue that we have with non-holo QGT not being a linear operator is also present for our default implementation of QGTJacobian for any complex-valued ansatz.

Because by default we split the real and imaginary part.

--

My suggestion would be to try to remove this ambiguity and make the matvec method work already in the enlarged space (and in the enlarged space only, since that's the only space in which the QGT is a LinearOperator).

I'm against it. The multiplication of QGT times vector is well defined in any space. If QGT is a linear object or not, it's your problem as a user. We can document it. We can add an easy way to check for it. I'm all in for that.

But I'm against removing features when we could just document what QGT is.

This would likely allow also to remove the internal solve method (there might be other issues though...)

We already had this discussion. Those methods are a performance optimisation and are required. (I'll try to find the issue later) We could get rid of them if we could dispatch on the solver method, but we can't. So we need to wrap the solve call to include some setup before it.

And this preparation step is different depending on QGTimplementation.

gcarleo commented 3 years ago

The multiplication of QGT times vector is well defined in any space. If QGT is a linear object or not, it's your problem as a user.

I'm sorry but it cannot be a problem of the user if you promise a LinearOperator but the operator you return is non-linear... So either you remove the inheritance from LinearOperator or you actually return a linear operator. I think the correct thing is to return a LinearOperator in the doubled space. This would allow also to automatically have to_dense to work correctly.

The issue is that now instead we are returning a non linear operator that inherits from LinearOperator, and this just does not make sense mathematically, there isn't much too add there...

Otherwise I can also define the RBM to be a LinearOperator, since the action on a vector is a well defined object???

attila-i-szabo commented 3 years ago

I'm sorry but it cannot be a problem of the user if you promise a LinearOperator but the operator you return is non-linear...

(a) it is linear (b) differentiating with respect to a complex number only really makes sense if the function is analytic (you can get around this by also adding derivatives w.r.t. conjugates, but at that point it's obvious that it's sugar for two real numbers), so it is the user's fault if they want to produce ill-defined derivatives, and they're lucky that jax has a pretty good convention for what it might mean

Ok I see, so it's really important to use the internal solve method and not the solve method of the scipy solver.

Not in the sense of this issue. I'm not aware of any solver that would use complex scale factors or anything like that that would require linearity other than w.r.t multiplying with real numbers, which we do have. But the internal solver improves performance for it can be jitted

My suggestion would be to try to remove this ambiguity and make the matvec method work already in the enlarged space

This is (a) wasteful for holomorphic functions (b) wasteful if you only want to do SR with this (which is still the main use case of QGT) I personally don't care what to_dense does (it would probably make sense to offer doing the split there, even as the default option – holomorphic=False), but SR is already time-consuming enough, I'm against slowing it down by a factor of 2 for no reason

gcarleo commented 3 years ago

(a) it is linear

why so? I thought we agreed on the fact that this is nonlinear in the non-holomorphic case, did I miss something ?

(b)differentiating with respect to a complex number only really makes sense if the function is analytic (you can get around this by also adding derivatives w.r.t. conjugates, but at that point it's obvious that it's sugar for two real numbers), so it is the user's fault if they want to produce ill-defined derivatives, and they're lucky that jax has a pretty good convention for what it might mean

on this we totally agree, all this issue is about something that is just ill defined and I think we should just not support, instead of complicating our lives... !

Not in the sense of this issue. I'm not aware of any solver that would use complex scale factors or anything like that that would require linearity other than w.r.t multiplying with real numbers, which we do have. But the internal solver improves performance for it can be jitted

I understand, but in the general case we should guarantuee linearity also wrt to complex factors

(b) wasteful if you only want to do SR with this (which is still the main use case of QGT)

We have to think that QGT is used also for dynamics and potentially other applications, so its definition should be such that it is usable not only for SR

I'm against slowing it down by a factor of 2 for no reason

but why would my suggestion slow it down of a factor of 2? I understand correctly, the solve method is already doing the doubling internally (as per @PhilipVinc comment?), what I am proposing is just to make the doubling explicitly visible

attila-i-szabo commented 3 years ago

(a) it is linear

why so? I thought we agreed on the fact that this is nonlinear in the non-holomorphic case, did I miss something ?

You seem to think that it's as nonlinear as an RBM, which is clearly not the case.

on this we totally agree, all this issue is about something that is just ill defined and I think we should just not support, instead of complicating our lives... !

I mean, we can put a holomorphic trip switch on QGTOnTheFly in general a la Jax but I don't want to stop treating holomorphic functions as C→C

(b) wasteful if you only want to do SR with this (which is still the main use case of QGT)

We have to think that QGT is used also for dynamics and potentially other applications, so its definition should be such that it is usable not only for SR

I agree that to_dense should be fixed (and splitting would also cause it to produce the same matrix as QGTJacobian, so doubly good). But you seem to want to introduce this split into matvec which serves no purpose whatsoever, because you always get back the product that you would if you split and then reassembled. If you have problems with the reassembly, we'll have to stop supporting complex parameters altogether because there'd be no way to update them

but why would my suggestion slow it down of a factor of 2? I understand correctly, the solve method is already doing the doubling internally (as per @PhilipVinc comment?), what I am proposing is just to make the doubling explicitly visible

It doesn't explicitly do the doubling, because then every real and imaginary part of gradients of a holomorphic function would have to be calculated twice (because you replace the complex number with a 2×2 matrix constrained by Cauchy-Riemann), so autodiff doesn't split known holomorphic operations (like linear maps).

gcarleo commented 3 years ago

You seem to think that it's as nonlinear as an RBM, which is clearly not the case.

From my knowledge of mathematics, there exist only two types of functions: linear and non-linear functions, there isn't such a thing as being a bit less non-linear function than another one. The RBM is non-linear and the QGT we are talking about is also non-linear. The mathematical definition of linearity of a function is quite clear... :)

So, since the QGT function in this case is non-linear, we just cannot return a LinearOperator object. Can we agree on this fundamental point?

it doesn't explicitly do the doubling, because then every real and imaginary part of gradients of a holomorphic function would have to be calculated twice (because you replace the complex number with a 2×2 matrix constrained by Cauchy-Riemann), so autodiff doesn't split known holomorphic operations (like linear maps).

The holomorphic case already works, as far as I can tell there isn't much to fix, since the operator returned seems (?) to be already linear...

import netket as nk
import jax
import numpy as np
from jax import random

ma = nk.models.RBM(alpha=1, dtype=complex)
sa = nk.sampler.MetropolisLocal(nk.hilbert.Spin(0.5, 16), n_chains=16)
vs = nk.vqs.MCState(sa, ma)

qgt = vs.quantum_geometric_tensor()

n_pars=vs.n_parameters

key = random.PRNGKey(0)
key1, subkey = random.split(key)
v=random.uniform(key,shape=(n_pars,))+1.0j*random.uniform(key1,shape=(n_pars,))

np.testing.assert_allclose(qgt@((2.+1.0j)*v),(2.+1.0j)*(qgt@v))
attila-i-szabo commented 3 years ago

The mathematical definition of linearity of a function is quite clear... :)

Not when we want to support too many things...

The problem is that the mathematical object you are looking for doesn't exist for nonholomorphic C→C functions for the simple reason that they cannot be differentiated as with respect to complex variables. (I mean, you can write down <dψ/dz_i | dψ/dz_j> even if it's not holomorphic, but it's not useful because you have gradients w.r.t. the conjugates and those are just as important.) Nonholomorphic functions are really just sugar for R→C functions, and in that sense, QGTOnTheFly is linear. In any case, matvec returns the QGT-vector product you'd get if you unravelled the sugar, without having to do so manually. (BTW that's also the behaviour we have in QGTJacobian.matvec by ravelling complex parameters by hand; all I'm saying is that we shouldn't waste dev time on changing the code in QGTOnTheFly.matvec if it's not changing the behaviour at all and probably makes things slower) All we can do about it not being a strictly linear object is requiring the user only to use holomorphic functions; that's kinda pointless given that matvec behaves as it should even when you violate this.

In to_dense we of course need it to be strictly linear, and for that we must ravel. That's just an omission, nobody cared enough to fix it up yet.

gcarleo commented 3 years ago

shouldn't waste dev time on changing the code in QGTOnTheFly.matvec if it's not changing the behaviour at all and probably makes things slower

I completely agree that this is a relatively minor branch of the code, and there are issues that are much more pressing (e.g. #830 and related). That's why I think that just dropping support for C->C non-holo would (have been?) the right thing to do, because it simplifies the code and avoids a couple of inconsistencies that I think are problematic as they are now...

In any case, matvec returns the QGT-vector product you'd get if you unravelled the sugar

I would like to understand this point better though... I am not convinced that the current behavior for the matmul is actually correct (even after unravelling) for the QGT.

If I take a function C->C non-holo, this is (as it has been pointed out many times) essentially just a lazy way of writing a R->C function with twice as many parameters. The QGT of the latter is a 2Nx2N complex-valued (hermitean) matrix. If you unravel also the vector you want to apply the QGT to, you get a real-valued vector of 2N entries. So, if I am not missing something crucial here, (complex-valued) QGT X (real) vector -> (complex) vector, of size 2N. Now, if you ravel it back into a complex vector of size N, you must discard half of the vector, there is no way out as far as I can tell. My impression is that in the context of SR this is still fine, because somehow JAX does this product as Re(QGT) X vector, thus in this case the ravel-unravel does not cause issues in the sense that the output vector is just a real vector of size 2N, thus can be compressed into a complex vector of size N.

However:

  1. This is valid only for SR, thus the resulting QGT X vector product (even after unravelling) is wrong (imagine doing this for cases where I would need the imaginary part of the QGT, for example)

  2. The returned operator, as we have discussed at length, and at variance with all the other cases (R->C,R->C,C->C holo), while being promised to be LinearOperator is not a linear operator in the widely accepted notion of the term

So, while I understand that SR "still works" in this case (because taking the Re(QGT) is the right thing to do), we should at least forbid direct calls to quantum_geometric_tensor or @, since we are losing the imaginary part of the QGT. Please correct me if I got this point wrong!

PhilipVinc commented 3 years ago

Linearity

A linear map f: V →W is 𝕂-linear or linear over the field 𝕂 if the two conditions are met:

(i)     f(x) + f(y)= f(x+y)     ∀ x∈V, y ∈W
(ii)    f(ax) = af(x)               ∀ a∈K

which are (i) additivity of addition and (ii) homogeneity over the field 𝕂.

It is often assumed that V,W must be vector space over the field 𝕂 in order for a function to be linear, (which is why @gcarleo is claiming that our QGT can be non-linear), however that is just a particular case of linearity. An example of such operation is complex conjugation: f(x) = x* is ℝ-linear but it is not ℂ-linear.

In the case of QGT, we have 2 possible types of linearity to consider: ℝ-linearity and ℂ-linearity. It is also unfortunate that few distinguish between those two when talking about solvers for the linear system f( x⃗) = y⃗: if x,y∈ ℂᴺ but f is only ℝ-linear, can this system be solved by the likes of GMRES or conjugate gradient?

I think it's a valid, well posed question worth an answer. I also suspect (though have not investigated) that they can indeed be used to solve it.

QGT implementation

We currently have 2 classes of QGT implementations: QGTOnTheFly and QGTJacobian.

I therefore believe that the QGT is C-linear for holomorphic and maybe, to be checked R-linear for non-holomorphic functions. Because of this, I think that (1) We should actually check and (2) it's sufficient to document this: we can guarantee that the QGT is R-linear, it's up to you to know if it is also C-linear.

Do we all agree on this? If we agree on this, we can stop discussing the QGT itself and only discuss how to represent non-holomorphic QGT as a dense matrix

setup code for the snippets above:

import netket as nk
import jax.numpy as jnp
import jax

ma = nk.models.RBM(alpha=1, dtype=complex)
sa = nk.sampler.MetropolisLocal(nk.hilbert.Spin(0.5, 16), n_chains=16)
vs = nk.vqs.MCState(sa, ma)

v=jax.numpy.ones((vs.n_parameters,))
gcarleo commented 3 years ago

Thanks for checking this @PhilipVinc, indeed I agree with those two points, even though I suspect that Gmres requires complex linearity whereas minres requires real linearity (just a guess, we need to check the implementation or the papers unfortunately...)

I agree it's OK to document this, however I am still worried by the other issue (which seems deeper than linearity, in a sense) that in the non holo case we are systematically dropping half of the QGT (its imaginary part) both when taking products with vectors and when solving systems. As I was writing above this is not a problem for SR but it is for most of the other cases where you need the QGT (either as a linear operator or as a full matrix). I think this point should be I vestigated more, I wrote a few tests to check if the QGT * vector product is done correctly but I cannot do a PR right now

gcarleo commented 3 years ago

By the way, I think that the fact that we drop the imaginary part of the QGT is also the reason why in the non holo case it is only R linear and not C linear... In the holo case everything is fine

I think this issue is deeper in the sense that it is just not possible to have the product QGT x complex vector return a single complex vector of size N, in the non holo case. Jjust for counting arguments it should return at least two complex vectors of size N (maybe I'm mistaken, but I think it's the only possibility in the non holo case?)

attila-i-szabo commented 3 years ago

Here is a description of how I see it, I'll try to answer some specific points in the next comment.

Simplest case: R→C. QGTOnTheFly calculates the matrix of Re(<dψ/dx|dψ/dy>), which is the preconditioner used in SR and (if I understand the definition correctly) the real part of the QGT. This behaviour is due to the JAX autodifferentiation model, and matmul only works well with a real vector (otherwise the JVP either truncates it silently or errors out, I forgot which), so it's not C-linear, although clearly R-linear.

A generic C→C function is treated by JAX as if you split both the input and the output into real and imaginary parts. Thus it can now take and return complex vectors as far as the parameters are complex, but it's still only R-linear since differentiating a non-holomorphic function isn't C-linear either.

This is also what happens internally for a holomorphic C→C function too. However, the Cauchy-Riemann equations provide that the real 2x2 matrices in the enlarged Jacobian behave like a multiplication with a complex number. That is, we could package it as a complex-valued QGT (now without real-part signs) of the original parameter count, which is now C-linear. This depends on dψ/dz*=0, so the wave function and all its gradients can be specified in terms of genuinely complex parameters.

That is, QGTOnTheFly.matmul works as an SR preconditioner (and it is R-linear) for any kind of network. To represent it as a dense matrix for non-holo C→C functions, we need to split real and imaginary parts, but that is still the real part of the QGT (and so it is for R→C functions). Deviating from that would be difficult because both JVP and VJP stick to the original dtypes, and duplicate calls to both would slow down things by a factor of 4, for no gain on any problem we use QGTs for.

I think it's best to document this and say that it is only really a QGT for holomorphic functions. I do think everything is the same for QGTJacobian* (at least it was when I first wrote it). This also means we don't touch QGTOnTheFly.matmul.

As for representing the non-holo SR matrix (which is not the QGT), we basically need to push through 1-hot and i-hot vectors for every complex parameter and then separating real and imaginary parts in the output. We just need to make sure we do this (and represent the result) consistent with the output of QGTJacobian* so that the behaviour doesn't depend on which implementation you use. Also, we don't want to do this for holomorphic functions, so there should be a holomorphic switch (which defaults to False for safety)

attila-i-szabo commented 3 years ago

both real and complex split explicitly real and imaginary part like QGTOnTheFly, so I expected it to be R-linear, but I found out it is C-linear.

Have you tested this on a complex RBM @PhilipVinc? Whether the function is holomorphic or not doesn't depend on how you store the QGT; complex and holomorphic ought to return the same result for a holomorphic function, it's just that complex is less efficient in this case. It should not be C-linear for a non-holomorphic function. As for real, it can be C-linear, and it seems that having the Jacobians explicitly allows sidestepping the issue I flagged with QGTOnTheFly for R→C functions. However, it still encodes Re(QGT).

By the way, I think that the fact that we drop the imaginary part of the QGT is also the reason why in the non holo case it is only R linear and not C linear... In the holo case everything is fine

I think this issue is deeper in the sense that it is just not possible to have the product QGT x complex vector return a single complex vector of size N, in the non holo case. Jjust for counting arguments it should return at least two complex vectors of size N (maybe I'm mistaken, but I think it's the only possibility in the non holo case?)

You're right, this basically boils down to dψ/dz != 0 for non-holomorphic functions, so you have two complex-valued gradients to keep track of for each parameter.

As I was writing above this is not a problem for SR but it is for most of the other cases where you need the QGT (either as a linear operator or as a full matrix). I think this point should be I vestigated more, I wrote a few tests to check if the QGT * vector product is done correctly but I cannot do a PR right now

Could you give a concrete example where the full QGT is needed for R→C functions? (this is easier to think of than non-holo functions and it's largely equivalent) It certainly cannot be used in any kind of preconditioner (for dynamics or anything) as the updates to parameters must stay real... I guess that's related, but also complex entries in the QGT as a "metric tensor" don't make much sense (imaginary parts are antisymmetric so they disappear in quadratic forms with real dx), so I'm puzzled what usefulness they could have

even though I suspect that Gmres requires complex linearity whereas minres requires real linearity

Would that mean that GMRES behaves differently if you supply complex vectors split into real and imaginary parts??

PhilipVinc commented 3 years ago

Have you tested this on a complex RBM @PhilipVinc?

yes i was testing this on a complex RBM (you can see the code i was using). I did not want to sidetrack the discussion but looking at the source code it seems to me that mode=real is not checked anywhere, and that we only check if mode==holomoprhic or mode!=holomorphic.

matmul only works well with a real vector (otherwise the JVP either truncates it silently or errors out, I forgot which), so it's not C-linear, although clearly R-linear.

Yes, but the truncation is a jax/specific thing that by wrapping operations a la nk.jax.vjp we can avoid. The question is does it make sense? Can we have something that always work?

One thing we might consider is to have qgt be the full qgt, and internally in SR call qgt.real which just turns on a flag to do operations only using the real part....

attila-i-szabo commented 3 years ago

Yes, but the truncation is a jax/specific thing that by wrapping operations a la nk.jax.vjp we can avoid.

Could you please give an example where this is desirable? In all the contexts that I'm aware of (i.e. you use the QGT as a sort of metric on the space of parametrised wave functions, that's more general than just SR), the imaginary parts are irrelevant for real parameters, so all you achieve with working them out is (I think) 4x the runtime for things that you then throw away.

The question is does it make sense?

This really depends on whether there's a use case for it. You don't seem to suggest there is one.

Can we have something that always work?

It would certainly be quite ugly for non-holo C→C because you can't use C-R relations to package the gradients, so you need to split and store 4 complex numbers (it's not clear to me whether there are any relations between them that would reduce this)

gcarleo commented 3 years ago

@attila-i-szabo this summarises the situation, the imaginary might be needed for the TDVP

image

but actually now after having discussed with @PhilipVinc I think we agree to keep the real part only, provided it is documented!

Now we are actually more worried that the tests in #880 seem to fail for the complex case and we cannot understand why yet...

attila-i-szabo commented 3 years ago

@attila-i-szabo this summarises the situation, the imaginary might be needed for the TDVP

Where is this from? I'm interested what all the different method names mean

but actually now after having discussed with @PhilipVinc I think we agree to keep the real part only, provided it is documented!

Yeah, we should definitely document it, but I think it would take an inordinate amount of time to come up with a solution that makes every possible user happy

gcarleo commented 3 years ago

yes this is the paper https://quantum-journal.org/papers/q-2019-10-07-191/