Quimb tensor contraction is significantly slower than cirqs state vector simulation

quantumlib / Qualtran

Qᴜᴀʟᴛʀᴀɴ is a Python library for expressing and analyzing Fault Tolerant Quantum algorithms.

https://qualtran.readthedocs.io/en/latest/

Apache License 2.0

173 stars 40 forks source link

Quimb tensor contraction is significantly slower than cirqs state vector simulation #793

Closed tanujkhattar closed 3 months ago

tanujkhattar commented 6 months ago

See https://github.com/quantumlib/Qualtran/pull/788#issuecomment-1996292451 for more details

I'm not sure what's the fix right now, but this is definitely a blocker to transition away from Cirq and use Qualtran native interfaces when writing Bloqs.

mpharrigan commented 6 months ago

I'll take a look

mpharrigan commented 6 months ago

lp = LPResourceState(10)

%%time
lp.tensor_contract()
CPU times: user 24.6 s, sys: 3.82 s, total: 28.5 s
Wall time: 4.18 s

from qualtran.cirq_interop.testing import GateHelper
circuit = GateHelper(lp).circuit

%%time
circuit.final_state_vector()
CPU times: user 59.3 ms, sys: 9.04 ms, total: 68.3 ms
Wall time: 70.1 ms

%%time
circuit.unitary()
CPU times: user 4.17 s, sys: 1.86 s, total: 6.03 s
Wall time: 3.24 s

so right off the top, a large proportion of the wall time difference is due to comparing apples and oranges. Asking cirq for the unitary is the same order of magnitude; although the cpu time is worse for tensor_contract() and the wall time is still slower for quimb

mpharrigan commented 6 months ago

It's worth pointing out that using the cirq simulation for LPResourceState still makes calls to Hadamard().tensor_contract() and XGate().tensor_contract() which may be slowing down the cirq simulation vs a pure cirq implementation. Is this because of the recent change to proactively change cirq gates into bloqs if the bloqs exist in qualtran? Something worth investigating later is if this slows down the cirq simulation

tanujkhattar commented 6 months ago

large proportion of the wall time difference is due to comparing apples and oranges

From a user perspective, I don't think we are comparing apples and oranges. LPResourceState is a Bloq with only a RIGHT register. So bloq.tensor_contract() is expected to return a state vector and not a unitary

The same query to cirq is circuit.final_state_vector(), so the comparison makes sense from an API perspective I think.

Is this because of the recent change to proactively change cirq gates into bloqs if the bloqs exist in qualtran?

No, in this specific case it's because I intentionally used Hadamard().on(*q) when specifying the cirq-style decomposition of LPResourceState; instead of using cirq.H. If I used cirq.H, then the cirq simulation would have used the cirq gate directly but bloq.decompose_bloq() would have used qualtran's Hadamard(). This behavior is enabled by the recent updates to change cirq gates to Bloqs when using the Bloq API

mpharrigan commented 6 months ago

oh, I see. I don't know why quimb is so slow then. The investigation continues

mpharrigan commented 6 months ago

So theres from qualtran.simulation.tensor import flatten_for_tensor_contraction, but the result has a lot of Join,Split combos that blow up the quimb network because of their naive tensor implementation.

it would be nice if quimb had a generalized identity tensor that it could contract efficiently but I didn't find it the first time I ran into this
otherwise, we'll need a function that destroys Join,Split pairs or keeps everything maximally split.

mpharrigan commented 3 months ago

LPResourceState now has a thru register. Do you have an example of a complicated state prep bloq. Also why did that turn into a thru register