Closed ncrubin closed 4 years ago
As an experiment I implemented my own copy methods for FqeData
def __copy__(self):
new_data = FqeData(nalpha=self._core.nalpha(),
nbeta=self._core.nbeta(),
norb=self._core.norb(),
fcigraph=self._core,
dtype=self._dtype)
new_data._low_thresh = self._low_thresh
new_data.coeff = numpy.copy(self.coeff)
return new_data
def __deepcopy__(self, memodict={}):
new_data = FqeData(nalpha=self._core.nalpha(),
nbeta=self._core.nbeta(),
norb=self._core.norb(),
fcigraph=self._core,
dtype=self._dtype)
new_data._low_thresh = self._low_thresh
# TODO: Check if this is necessary for numeric types
new_data.coeff = copy.deepcopy(self.coeff)
return new_data
Then reran the profiling on the same Hamiltonian and system above. The output is below
Wavefunction Initialization 0.07455205917358398
Cirq wf initialization time 0.0033540725708007812
Sun Aug 16 20:31:17 2020 fqe_givens_profile
47090 function calls (44774 primitive calls) in 4.629 seconds
Ordered by: cumulative time
List reduced from 176 to 6 due to restriction <'deepcopy'>
ncalls tottime percall cumtime percall filename:lineno(function)
2018/86 0.006 0.000 0.041 0.000 /Users/nickrubin/anaconda3/envs/fqe-dev/lib/python3.6/copy.py:132(deepcopy)
336/84 0.001 0.000 0.036 0.000 /Users/nickrubin/anaconda3/envs/fqe-dev/lib/python3.6/copy.py:236(_deepcopy_dict)
84 0.001 0.000 0.028 0.000 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/fqe_data.py:1693(__deepcopy__)
86 0.020 0.000 0.020 0.000 {method '__deepcopy__' of 'numpy.ndarray' objects}
84 0.001 0.000 0.001 0.000 /Users/nickrubin/anaconda3/envs/fqe-dev/lib/python3.6/copy.py:219(_deepcopy_tuple)
1344 0.000 0.000 0.000 0.000 /Users/nickrubin/anaconda3/envs/fqe-dev/lib/python3.6/copy.py:190(_deepcopy_atomic)
If we can get away with not calling a deepcopy on the coeffs because we know they are all numeric types then we can shave down the time a little bit. ~10% more. So, it seems like the savings is coming from not copying the FciGraph. This makes sense since FciGraph is also holding an exponential amount of information.
If you are curious what becomes the most expensive call (since the above clearly indicates copying is now negligible) here's the ordered profiler output
Wavefunction Initialization 0.07718181610107422
Cirq wf initialization time 0.003431081771850586
Sun Aug 16 20:39:47 2020 fqe_givens_profile
46670 function calls (44438 primitive calls) in 4.371 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 4.371 4.371 {built-in method builtins.exec}
1 0.000 0.000 4.371 4.371 <string>:1(<module>)
1 0.000 0.000 4.371 4.371 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/fqe_decorators.py:348(convert)
1 0.000 0.000 4.371 4.371 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/wavefunction.py:929(time_evolve)
2 0.001 0.000 4.363 2.181 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/wavefunction.py:767(transform)
40 0.001 0.000 4.349 0.109 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/fqe_decorators.py:333(convert)
40 0.000 0.000 4.347 0.109 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/wavefunction.py:343(apply)
40 0.001 0.000 4.331 0.108 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/wavefunction.py:384(_apply_array)
40 0.266 0.007 4.316 0.108 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/fqe_data.py:211(apply_inplace)
40 0.000 0.000 4.050 0.101 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/fqe_data.py:252(_apply_array_spatial1)
40 0.000 0.000 3.347 0.084 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/fqe_data.py:1215(calculate_dvec_spatial)
40 2.927 0.073 3.347 0.084 /Users/nickrubin/opt/OpenFermion-FQE/src/fqe/fqe_data.py:1236(_calculate_dvec_spatial_with_coeff)
192/147 0.001 0.000 0.721 0.005 {built-in method numpy.core._multiarray_umath.implement_array_function}
Now we are properly spending all our time on the dvec calculation and the apply_inplace from the LU transform.
Hi Nick - fcifraph can be shallow copied (it's a const pointer in the original C++ implementation in BAGEL). The reason why we return a new copy is that, whenever the operator is not diagonal, we need to create a new copy; so to have a unified function API, we had to do that. But obviously it is an option to provide such in-place routine (and copy back to the original when in-place is not possible). Btw it was my understanding that the garbage collector will delete the old wavefunction regardless of whether you shallow copy the graph (I need to test though to make sure if it is the case).
Btw the amount of data in fcigraph is exponential, but more like sqrt(size of coeff) norb norb. In the limit of large wave function, the size becomes negligible.
Output:
Everywhere in the code
copy.deepcopy
is used to get a new wavefunction. Interestingly,__deepcopy__
of numpy.ndarray` is a tiny fraction of this time (am I reading the cProfiler right here?). This tells me we should invest effort into removing all the copy.deepcopy everywhere or implement deepcopy for FqeData and FqeGraph in a smarter way. Maybe we can just copy the coeff data and use that?Anyways, I'm curious your thoughts @shiozaki about using shallow copy on fqeGraph and everything in FQEData except the coeffs. My concern is that the garbage collector wouldn't delete old wavefunctions and thus we would just memory leak crash. Really we shouldn't need to keep having to have the garbage collector come and blow away the state (which I think happens because a new copy is being returned each call of
time_evolve
. Even the "inplace" routines (apply_cosine_sine) make copies multiple copies of the data. We should be able to just copy the coefficients as needed and leave the graph and everything else alone.