jcmgray / quimb

A python library for quantum information and many-body calculations including tensor networks.
http://quimb.readthedocs.io
Other
501 stars 109 forks source link

Best way to use the TN Circuit class with TNOptimizer #81

Open chris-n-self opened 3 years ago

chris-n-self commented 3 years ago

I would like to use the approach in example 10. Bayesian Optimizing QAOA Circuit Energy (using cotengra contraction orders and the circuit local_expectation functions exploiting light-cone cancellations) but also taking advantage of all the auto-differentiation integration in the TNOptimizer class.

I had a look at the code in tensor\optimize.py & tensor\circuit.py, and I am not sure how to proceed. My first thoughts would be to do one of:

Would you advise trying any of those? Or is there a better approach you can see? Alternatively, should I forget about trying to use the circuit local_expectation functions and just calculate the energy expectation of the TN?

jcmgray commented 3 years ago

Hey Chris! so there is a quick example of how to work around this currently here:

https://github.com/jcmgray/quimb/issues/77#issuecomment-767077816

Ideally yes the TNOptimizer would take a Circuit as an possible input. I think this should be pretty easy to implement but haven't got round to it. It would need:

  1. For the Circuit to be copied on input and the _psi parsed.
  2. For the gradient optimizers to inject their vector into a TN, then that TN into the Circuit, before supplying it to loss_fn etc.
  3. For the Circuit to have its parameters updated from the tensor network and returned at the end of the optimisation.

I don't see any obstacles to these, and maybe a CircuitOptimizer using a TNOptimizer internally is the cleanest outward facing API.

Note that when it comes to autodiff, not all the simplifications methods that Circuit has turned on by default are compatible (since they depend on the actual tensor values rather than just shapes). This might have a big impact on how efficient the contractions are in comparison to 'fully simplified' versions.

chris-n-self commented 3 years ago

Thanks Johnnie! That's really helpful.

I am finding the optimisation fails for me though when I use that code for my circuit and Hamiltonian. The progress bar displays larger and larger negative numbers and tnopt.res has fun: nan and message: b'ABNORMAL_TERMINATION_IN_LNSRCH'.

My edited version of your example is:

import quimb as qu
import quimb.tensor as qtn
from autoray import do
import itertools
import functools
from scipy.stats import uniform

def circ_qaoa_zzxz(
    terms,
    depth,
    lambdas,
    gammas,
    betas,
    **circuit_opts,
):
    """ edited from quimb """
    circuit_opts.setdefault('gate_opts', {})
    circuit_opts['gate_opts'].setdefault('contract', False)

    n = max(itertools.chain.from_iterable(terms)) + 1

    gates = []

    # layer of hadamards to get into plus state
    for i in range(n):
        gates.append((0, 'h', i))

    for d in range(depth):
        for (i, j) in terms:
            gates.append((d, 'rzz', -lambdas[d], i, j))

        for i in range(n):
            gates.append((d, 'rx', gammas[d] * 2, i))

        for i in range(n):
            gates.append((d, 'rz', betas[d] * 2, i))

    circ = qtn.Circuit(n, **circuit_opts)
    circ.apply_gates(gates)

    return circ

# setup a circuit
n = 4
p = int(n/2)
lambdas = qu.randn(p)
gammas = qu.randn(p)
betas = qu.randn(p)
nn_edges = [ (i,(i+1)%n) for i in range(n) ]
circ = circ_qaoa_zzxz(nn_edges,p,lambdas,gammas,betas,)

# the local operators
J_rands = uniform.rvs(loc=-1,scale=2,size=len(nn_edges))
hX_rands = uniform.rvs(loc=-1,scale=2,size=n)
ops = { nn: Jr * qu.pauli('Z') & qu.pauli('Z') for Jr,nn in zip(J_rands,nn_edges) }
ops.update({ i: -1*hXr * qu.pauli('X') for i,hXr in enumerate(hX_rands) })

# tags describing all the different reverse lightcones for each operators
lightcone_tags = {where: circ.get_reverse_lightcone_tags(where) for where in ops}

# function that takes the input TN and computes the ith loss
def loss_i(psi, where, ops):
    tags = lightcone_tags[where]
    ket  = psi.select(tags, 'any')
    bra = ket.H
    expec = ket.gate(ops[where], where) | bra
    return do('real', expec.contract(all, optimize='auto-hq'))

# since they are a sum we can evaluate them separately
loss_fns = [
    functools.partial(loss_i, where=where)
    for where in ops
]

tnopt = qtn.TNOptimizer(
    circ.psi,
    loss_fn=loss_fns,
    tags=['RZZ','RX','RZ'],
    loss_constants={'ops': ops},
    autodiff_backend='jax',
)

tnopt.optimize_basinhopping(n=10, nhop=10)
# -2.577747688523476e+33 [best: -2.577747688523476e+33] :   9%|▉         | 9/100 [01:09<09:22,  6.18s/it]/home/cns08/anaconda3/envs/qaoadf-cotengra/lib/python3.8/site-packages/quimb/tensor/optimize.py:386: RuntimeWarning: invalid value encountered in add
#  grads[i] += g_i
# -2.577747688523476e+33 [best: -2.577747688523476e+33] :   9%|▉         | 9/100 [01:12<12:13,  8.06s/it]
# <TensorNetwork1DVector(tensors=32, indices=40, L=4, max_bond=2)>
jcmgray commented 3 years ago

I think you just need to specify the gates as parametrized:

circ.rx(0.123, 5, parametrize=True)

currently the optimizer takes the raw (and unbounded) entries of the gate tensors (leading to the blowup), whereas when they are parametrized the tensor is defined through the function that generates the unitary gate, so the state should stay normalized.

chris-n-self commented 3 years ago

That's great, thank you! That fixed that problem. Does the gates being parameterised have any other effects?

jcmgray commented 3 years ago

Not really, rather than storing an array, a parametrized tensor just calls fn(params) when you access its .data attribute. I suppose repeatedly retrieving the data might be slightly slower... but it may even be cached, can't remember!

chris-n-self commented 3 years ago

Thanks, that's great. The reason I thought it might work to leave the tensors unparameterised was because of the else of the if isinstance(t, PTensor): check in the inject_ function in optimise.py:

def inject_(arrays, tn):
    for t in tn:
        for tag in t.tags:
            match = variable_finder.match(tag)
            if match is not None:
                i = int(match.groups(1)[0])

                if isinstance(t, PTensor):
                    t.params = arrays[i]
                else:
                    t.modify(data=arrays[i])

                break

Purely out of interest, would it have also worked to bound the optimisation parameters in some way so that t.modify(data=arrays[i]) stays valid?

chris-n-self commented 3 years ago

Sort of related to this (though this is maybe getting off track from the original issue) I have wondered about how to hack the TNOptimizer class to allow two different parameterised tensors to share an optimisation parameter. Would it work to edit the parse_network_to_backend function to assign the same variable_tag to tensors that share one of the "optimise over" tags passed?

def parse_network_to_backend(tn, tags, constant_tags, to_constant):
    tn_ag = tn.copy()
    variables = []

    variable_tag = "__VARIABLE{}__"

    for t in tn_ag:
        # check if tensor has any of the constant tags
        if t.tags & constant_tags:
            t.modify(apply=to_constant)
            continue

        # if tags are specified only optimize those tagged
        if tags and not (t.tags & tags):
            t.modify(apply=to_constant)
            continue

        if isinstance(t, PTensor):
            data = t.params
        else:
            data = t.data

        # jax doesn't like numpy.ndarray subclasses...
        if isinstance(data, qarray):
            data = data.A

        # append the raw data but mark the corresponding tensor for reinsertion
        variables.append(data)
        t.add_tag(variable_tag.format(len(variables) - 1))

    return tn_ag, variables
jcmgray commented 3 years ago

Regarding the PTensor, I'm not sure I totally understand the question but maybe let me elaborate what happens: setting parametrize=True puts PTensor instances in the TN. When the optimizer is working out how large the total vector space is, it takes either the raw array size or the params size, so e.g. for the same gate unparam'd/param'd arrays[i] will be generally different sizes.

Purely out of interest, would it have also worked to bound the optimisation parameters in some way so that t.modify(data=arrays[i]) stays valid?

I think not generally - if you consider e.g. a rotation, the param space is smaller (a single angle) but generates 2x2 complex matrix - there's at least no simple way (i.e. x in [-b ,b]) to bound the entries of the matrix directly.

jcmgray commented 3 years ago

Regarding linking tensor optimizations, I think it would be pretty much that simple. One just needs a way to cleanly specify that all tags in e.g. optimize_together=[...] should be... optimized together. That would be a neat addition.

chris-n-self commented 3 years ago

I think not generally - if you consider e.g. a rotation, the param space is smaller (a single angle) but generates 2x2 complex matrix - there's at least no simple way (i.e. x in [-b ,b]) to bound the entries of the matrix directly.

Ah, I think I see the difference now, thanks

chris-n-self commented 3 years ago

Regarding linking tensor optimizations, I think it would be pretty much that simple. One just needs a way to cleanly specify that all tags in e.g. optimize_together=[...] should be... optimized together. That would be a neat addition.

If I had a go at implementing this, do you have any preference about what the new kwarg should be called? I was thinking potentially shared_tags to keep it consistent with the tags, constant_tags names. Also, would there then be any tests that need to be updated for this addition?