quantumlib / OpenFermion-FQE

The Fermionic Quantum Emulator (FQE) is a fermionic simulation research tool specializing in quantum circuits emulating fermion dynamics.
Apache License 2.0
61 stars 26 forks source link

deepcopy is killing performance #14

Closed ncrubin closed 4 years ago

ncrubin commented 4 years ago
    import cProfile
    import numpy as np
    import cirq
    from openfermion.circuits.primitives import optimal_givens_decomposition
    from openfermion import givens_decomposition_square
    from openfermion import random_quadratic_hamiltonian
    import openfermion as of
    from scipy.linalg import expm
    import fqe
    from fqe.algorithm.low_rank import evolve_fqe_givens
    import time
    norbs = 10
    sz = 0
    nelec = norbs
    start_time = time.time()
    initial_wfn = fqe.Wavefunction([[nelec, sz, norbs]])
    print("Wavefunction Initialization ", time.time() - start_time)
    graph = initial_wfn.sector((nelec, sz)).get_fcigraph()
    hf_wf = np.zeros((graph.lena(), graph.lenb()), dtype=np.complex128)
    hf_wf[0, 0] = 1
    start_time = time.time()
    cirq_wf = of.jw_hartree_fock_state(nelec, 2 * norbs)
    print("Cirq wf initialization time ", time.time() - start_time)
    initial_wfn.set_wfn(strategy='from_data',
                        raw_data={(nelec, sz): hf_wf})

    # set up Hamiltonian
    ikappa = random_quadratic_hamiltonian(norbs, conserves_particle_number=True,
                                          real=True, expand_spin=False, seed=5)
    ikappa_matrix = ikappa.n_body_tensors[1, 0]

    # Evolution time and unitaries
    dt = 0.275
    u = expm(-1j * dt * ikappa_matrix)
    fqe_ham = fqe.restricted_hamiltonian.RestrictedHamiltonian(
        (ikappa_matrix * dt,))

    cProfile.run('initial_wfn.time_evolve(1., fqe_ham)', 'fqe_givens_profile')

    import pstats
    profile = pstats.Stats('fqe_givens_profile')
    profile.sort_stats('cumtime')
    #
    # profile.print_stats()
    profile.print_stats('deepcopy')

Output:

Wavefunction Initialization  0.07503366470336914
Cirq wf initialization time  0.003200054168701172
Sun Aug 16 18:48:22 2020    fqe_givens_profile

         30394280 function calls (25113404 primitive calls) in 12.598 seconds

   Ordered by: cumulative time
   List reduced from 172 to 6 due to restriction <'deepcopy'>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
5279822/86    3.769    0.000    8.435    0.098 /Users/nickrubin/anaconda3/envs/fqe-dev/lib/python3.6/copy.py:132(deepcopy)
   924/84    0.033    0.000    8.432    0.100 /Users/nickrubin/anaconda3/envs/fqe-dev/lib/python3.6/copy.py:236(_deepcopy_dict)
    16968    0.378    0.000    8.163    0.000 /Users/nickrubin/anaconda3/envs/fqe-dev/lib/python3.6/copy.py:210(_deepcopy_list)
  1286964    1.377    0.000    6.299    0.000 /Users/nickrubin/anaconda3/envs/fqe-dev/lib/python3.6/copy.py:219(_deepcopy_tuple)
  3974628    0.293    0.000    0.293    0.000 /Users/nickrubin/anaconda3/envs/fqe-dev/lib/python3.6/copy.py:190(_deepcopy_atomic)
       86    0.012    0.000    0.012    0.000 {method '__deepcopy__' of 'numpy.ndarray' objects}

Everywhere in the code copy.deepcopy is used to get a new wavefunction. Interestingly, __deepcopy__ of numpy.ndarray` is a tiny fraction of this time (am I reading the cProfiler right here?). This tells me we should invest effort into removing all the copy.deepcopy everywhere or implement deepcopy for FqeData and FqeGraph in a smarter way. Maybe we can just copy the coeff data and use that?

Anyways, I'm curious your thoughts @shiozaki about using shallow copy on fqeGraph and everything in FQEData except the coeffs. My concern is that the garbage collector wouldn't delete old wavefunctions and thus we would just memory leak crash. Really we shouldn't need to keep having to have the garbage collector come and blow away the state (which I think happens because a new copy is being returned each call of time_evolve. Even the "inplace" routines (apply_cosine_sine) make copies multiple copies of the data. We should be able to just copy the coefficients as needed and leave the graph and everything else alone.

ncrubin commented 4 years ago

As an experiment I implemented my own copy methods for FqeData

    def __copy__(self):
        new_data = FqeData(nalpha=self._core.nalpha(),
                           nbeta=self._core.nbeta(),
                           norb=self._core.norb(),
                           fcigraph=self._core,
                           dtype=self._dtype)
        new_data._low_thresh = self._low_thresh
        new_data.coeff = numpy.copy(self.coeff)
        return new_data

    def __deepcopy__(self, memodict={}):
        new_data = FqeData(nalpha=self._core.nalpha(),
                           nbeta=self._core.nbeta(),
                           norb=self._core.norb(),
                           fcigraph=self._core,
                           dtype=self._dtype)
        new_data._low_thresh = self._low_thresh
        # TODO: Check if this is necessary for numeric types
        new_data.coeff = copy.deepcopy(self.coeff)
        return new_data

Then reran the profiling on the same Hamiltonian and system above. The output is below

Wavefunction Initialization  0.07455205917358398
Cirq wf initialization time  0.0033540725708007812
Sun Aug 16 20:31:17 2020    fqe_givens_profile

         47090 function calls (44774 primitive calls) in 4.629 seconds

   Ordered by: cumulative time
   List reduced from 176 to 6 due to restriction <'deepcopy'>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  2018/86    0.006    0.000    0.041    0.000 /Users/nickrubin/anaconda3/envs/fqe-dev/lib/python3.6/copy.py:132(deepcopy)
   336/84    0.001    0.000    0.036    0.000 /Users/nickrubin/anaconda3/envs/fqe-dev/lib/python3.6/copy.py:236(_deepcopy_dict)
       84    0.001    0.000    0.028    0.000 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/fqe_data.py:1693(__deepcopy__)
       86    0.020    0.000    0.020    0.000 {method '__deepcopy__' of 'numpy.ndarray' objects}
       84    0.001    0.000    0.001    0.000 /Users/nickrubin/anaconda3/envs/fqe-dev/lib/python3.6/copy.py:219(_deepcopy_tuple)
     1344    0.000    0.000    0.000    0.000 /Users/nickrubin/anaconda3/envs/fqe-dev/lib/python3.6/copy.py:190(_deepcopy_atomic)

If we can get away with not calling a deepcopy on the coeffs because we know they are all numeric types then we can shave down the time a little bit. ~10% more. So, it seems like the savings is coming from not copying the FciGraph. This makes sense since FciGraph is also holding an exponential amount of information.

If you are curious what becomes the most expensive call (since the above clearly indicates copying is now negligible) here's the ordered profiler output

Wavefunction Initialization  0.07718181610107422
Cirq wf initialization time  0.003431081771850586
Sun Aug 16 20:39:47 2020    fqe_givens_profile

         46670 function calls (44438 primitive calls) in 4.371 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    4.371    4.371 {built-in method builtins.exec}
        1    0.000    0.000    4.371    4.371 <string>:1(<module>)
        1    0.000    0.000    4.371    4.371 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/fqe_decorators.py:348(convert)
        1    0.000    0.000    4.371    4.371 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/wavefunction.py:929(time_evolve)
        2    0.001    0.000    4.363    2.181 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/wavefunction.py:767(transform)
       40    0.001    0.000    4.349    0.109 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/fqe_decorators.py:333(convert)
       40    0.000    0.000    4.347    0.109 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/wavefunction.py:343(apply)
       40    0.001    0.000    4.331    0.108 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/wavefunction.py:384(_apply_array)
       40    0.266    0.007    4.316    0.108 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/fqe_data.py:211(apply_inplace)
       40    0.000    0.000    4.050    0.101 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/fqe_data.py:252(_apply_array_spatial1)
       40    0.000    0.000    3.347    0.084 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/fqe_data.py:1215(calculate_dvec_spatial)
       40    2.927    0.073    3.347    0.084 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/fqe_data.py:1236(_calculate_dvec_spatial_with_coeff)
  192/147    0.001    0.000    0.721    0.005 {built-in method numpy.core._multiarray_umath.implement_array_function}

Now we are properly spending all our time on the dvec calculation and the apply_inplace from the LU transform.

shiozaki commented 4 years ago

Hi Nick - fcifraph can be shallow copied (it's a const pointer in the original C++ implementation in BAGEL). The reason why we return a new copy is that, whenever the operator is not diagonal, we need to create a new copy; so to have a unified function API, we had to do that. But obviously it is an option to provide such in-place routine (and copy back to the original when in-place is not possible). Btw it was my understanding that the garbage collector will delete the old wavefunction regardless of whether you shallow copy the graph (I need to test though to make sure if it is the case).

Btw the amount of data in fcigraph is exponential, but more like sqrt(size of coeff) norb norb. In the limit of large wave function, the size becomes negligible.