quantumlib / qsim

Schrödinger and Schrödinger-Feynman simulators for quantum circuits.
Apache License 2.0
454 stars 151 forks source link

Avoid statevector copy in `qsim.simulate(qc, initial_state=sv)` #617

Open hthayko opened 1 year ago

hthayko commented 1 year ago

I have a use case where I am simulating my circuit piece by piece so using the ability to pass an initial state to qsim is very handy. However I have noticed this makes a copy of the initial_state here:

Is there a way to avoid the expensive copy? If there is a little patch that can be applied (I assume couple of lines in void init_state() should do it) I would be happy to apply it locally and re-build qsim.

Thanks!

95-martin-orion commented 1 year ago

Hello Hayk! As I recall, that copy step exists to pass ownership of the state from Python to C++. Copyless transfer of ownership in the opposite direction is possible with pybind (and we in fact do so in this method), but it's less clear if Python-to-C++ passing can be done without copying. There's some discussion of it in this pybind issue, but it remains unresolved.

I'm no longer actively working on qsim, but @pavoljuhas or @sergeisakov might be available to look into this. Otherwise, if you're interested in pursuing this yourself you could look into the pybind issue above - I'd be more than happy to review a PR if you discover a way to avoid this copy :slightly_smiling_face:

sergeisakov commented 1 year ago

In principle, we don't have to transfer ownership if there is a guarantee that the state doesn't get garbage collected by Python during simulation. Though there might be another problem. State memory must be aligned on a 32-byte or 64-byte boundary when using AVX or AVX512.

hthayko commented 1 year ago

I tried this little fix, it works fine sometimes but more often than not it gives a segmentation fault:

   void init_state(const py::array_t<float> &input_vector) {
     StateSpace state_space = factory.CreateStateSpace();
     py::buffer_info buf_info = input_vector.request();
     float* ptr = static_cast<float*>(buf_info.ptr);
     state = state_space.Create(ptr, state.num_qubits());
     state_space.NormalToInternalOrder(state);
   }

Seems like the main issue is the python garbage collector. This is needed for a one-off job I have so I would be happy with a hacky solution as well if you can think of one (disabling gc temporarily, etc.)

pavoljuhas commented 1 year ago

Seems like the main issue is the python garbage collector.

The input_vector array will not get garbage collected if it it is referenced by some variable in a Python caller of init_state.

hthayko commented 1 year ago

I have all the variables in global scope, and the reference is kept - but still getting the segmentation fault.

import cirq
import qsimcirq

def get_all_H(N):
    circuit = cirq.Circuit()
    qubits = cirq.LineQubit.range(N)
    circuit.append([cirq.H(qubits[i]) for i in range(N)])
    return circuit

qc1 = get_all_H(10)

simulator = qsimcirq.QSimSimulator({'t': 8, 'f': 3, })
results0 = simulator.simulate(qc1)
results1 = simulator.simulate(qc1, initial_state=results0.final_state_vector)

Here the last line sometimes succeeds and sometimes segfaults.

sergeisakov commented 1 year ago

There are two issues here. First, [](void *data) { detail::free(data); } shouldn't be used in line 788 (pybind_main.cpp) in order to avoid double free. Second, the state vector allocated in Python must be aligned. Typically this is not the case.

hthayko commented 1 year ago

hmmm - the free at pybind_main.cpp:788 releases the state back to python. Where is the second (double) free?

sergeisakov commented 1 year ago

The free at pybind_main.cpp:788 deallocates the memory allocated on the C++ side. If the memory is allocated by Python then this free tries to perform additional deallocation.

hthayko commented 1 year ago

In that case there should be an issue on the second deallocation, e.g. when python gc kicks in after global scope variables are cleaned up. However I see seg fault before that - if I print results1.final_state_vector in my python code above - it will work half the time and segfault the other half.

sergeisakov commented 1 year ago

There is also a memory alignment issue. You see seg fault before printing when the state vector is not properly aligned.

hthayko commented 1 year ago

that issue should be deterministic though right? However I see the same code working ~20% of the time and seg faulting 80% of the time.

sergeisakov commented 1 year ago

I think it shouldn't be deterministic. You can try to add printf("%lu\n", (uint64_t) ptr % 32) or printf("%lu\n", (uint64_t) ptr % 64) (depending on what you have: AVX or AVX512) to your little fix. If it prints 0 then the memory alignment is okay and there shouldn't be a seg fault before memory deallocation.