Open tanujkhattar opened 5 months ago
On current main
By making a series of trivial fixes to the cirq interop (like replacing CNOT().on(a, b)
in the cirq-style decomposition with cirq.CNOT(a, b)
(the former is not needed), adding cache to total_qubits
and `extract_bloq_from_cirq1), we get
The cirq interop is still contributing about 50% to the total time, so if we rewrite the bloq natively without any cirq interop, we should be able to get another 2x speedup.
However, it is still going to be fairly slow (a few minutes at least for 2**12
example) for any reasonable system size; and this is not due to recursive implementation of QROM, but rather due to the fact that the size of the composite graph (product of gates x width) becomes large quickly.
Ideally, for common cost metrics like QubitCount
, we should try to avoid having to decompose the entire bloq at all (xref https://github.com/quantumlib/Qualtran/issues/957)
Ideally, we'd be able to do one level of decomposition of each bloq with relevant parameters in a small amount of time. So I think it's worthwhile seeing exactly what's the limiting factor of being able to decompose a N=4000 qrom.
One double-win (better pedagogy/abstraction) would be to group the CNOTs into a ctrled-XOR or ctrled-SET operation that can operate directly on a single bitsize=n_target register
Otherwise, I can try profiling a decomposition with roughly 10,000 subbloqs. Last time I did this, there was a huge contribution from tracking whether each soquet has been used exactly once. That's something we can futz around with easily.
I need the following code to run in 3-4 seconds; not 10+ minutes on a Google colab instance.
Hers is the timing from my current run -
I want to generate a plot like this