Question about GPU allocation for nqubits

visagim commented 1 year ago

Is there any way to a priori calculate GPU memory needed to simulate a circuit with a specific backend (eg. cupy)? For example to simulate a simple QAOA circuit from the examples:

import numpy as np
from qibo import models, hamiltonians
nqubits = 28
hamiltonian = hamiltonians.XXZ(nqubits, 0.5, False)
qaoa = models.QAOA(hamiltonian)
initial_parameters = 0.01 * np.random.uniform(0,1,4)
best_energy, final_parameters, extra = qaoa.minimize(initial_parameters, method="BFGS")

With double precision I would expect this to need around 128*2^28 / 1024^2 = 32,768 MiB of memory, however it allocates 16,688 MiB instead, which is about a factor of 2 less. The GPU I have available has at most 24,576 MiB, so it couldn't possibly allocate more than that.

I'm using qibojit with cupy backend. qibo 0.2.0.dev0 qibojit 0.0.7.dev0 cupy 11.4.0 CUDA 11.8

I'm guessing there is some optimization going on, I would like to find the documentation for it if you can point me to it.

scarrazza commented 1 year ago

@DiegoGM91 could you please check? In principle you need 128 * 2^28 bits ~ 4.3gb for a state with 28 qubits.

visagim commented 1 year ago

Sorry I missed a factor of 8, it should indeed be ~4.3gb as @scarrazza says (but not ~16gb)

visagim commented 1 year ago

It also does not seem necessarily related to QAOA:

from qibo import hamiltonians
nqubits = 28
# Create hamiltonian
hamiltonian = hamiltonians.XXZ(nqubits, dense=False)
state = hamiltonian.backend.plus_state(nqubits)
exp = hamiltonian.expectation(state)
print(exp)

Also allocates 16688MiB to GPU.

scarrazza commented 1 year ago

If we check the expectation code, there are at least 4 objects with state size 2**28 (see code here): state, statec, hstate, matrix @ state (this last one requires at least 1 copy when performing the multiplication), so I believe the 16gb is justified.

visagim commented 1 year ago

That makes sense, thanks!

qiboteam / qibo

Question about GPU allocation for nqubits #762