tenpy / tenpy

Tensor Network Python (TeNPy)
https://github.com/tenpy/tenpy
GNU General Public License v3.0
344 stars 124 forks source link

Alternative energy calculation quantum phase transition of transverse field Ising model example #317

Open DilhanM opened 7 months ago

DilhanM commented 7 months ago

On this advanced example, the energy of the DMRG ground state is not smooth across the critical point, as expected.

However, by calculating the energy as E0 = np.mean(M.bond_energies(psi))) I am able to reproduce a smooth energy surface across the critical point.

Is this calculating an energy different from the DMRG ground state energy, and if not, what is the reason behind this discrepancy?

Jakob-Unfried commented 4 months ago

This is the result of the current implementation with v0.11 tenpy:

Screenshot 2024-03-13 at 17 10 27

Jakob-Unfried commented 4 months ago

Similar for v0.10

Screenshot 2024-03-13 at 17 41 03
Jakob-Unfried commented 4 months ago

But it gives smooth results at v0.9...

Screenshot 2024-03-13 at 17 46 19
Jakob-Unfried commented 4 months ago

So i guess some commit between 0.9 and 0.10 introduced a problem

Today I learned about git bisect, highly recommend

Jakob-Unfried commented 4 months ago

Will update this message while I track down the bug...

There seem to be two different problems as I go through the history:

  1. In https://github.com/tenpy/tenpy/commit/1dd62aa8eb5fe31d741cf7fa1114bb4803d0d7ed, the default tolerance for MPS.canonical_form_infinite2 was made more strict. That causes the loop in _canonical_form_right_orthogonalize to never terminate in the example, close to criticality. That is not the cause of this bug though, as at some point later, DMRG switched to using MPS.canonical_form_infinite1, since that is now the default in MPS.canonical_form.

    • [ ] However, we should consider terminating the loop after some large number of iterations (say 10000?) and issuing a warning that the tolerance might be too strict, and the bug appears then, I suppose
  2. Will now look into how/why MPS.canonical_form_infinite1 fails.

Jakob-Unfried commented 4 months ago

@DilhanM You are correct by the way that they example is not behaving as expected. Thanks for reporting it!

Jakob-Unfried commented 4 months ago

I get ok results if I modify the current main branch to use canonical_form_infinite2 and set its tol=1e-12;

Screenshot 2024-03-14 at 12 47 14

But there still seems to be some bad convergence in the Sz expval

This gets better by allowing more DMRG sweeps;

Screenshot 2024-03-14 at 13 14 07
Jakob-Unfried commented 4 months ago

Did another experiment to find what changed the expectation value Sz and correlation function SxSx to develop these spikes; Setup:

Result: The spikes are introduced by 43b2d2c19fcfb5ec7a1ca3e73503dc8c8bf1ebee

It seems that after that we also get UserWarning: SVD with lapack_driver 'gesdd' failed. Use backup 'gesvd' at the outlier g values.

Jakob-Unfried commented 4 months ago

Going back through the history there are multiple problems that surface in this example, which I have finally untangled. The main challenge in identifying whats going on is that the default method for canonicalizing infinite MPS, and its default arguments have changed multiple times throughout the history. For the following three canonicalization methods, I bisected the history between the current main and v0.9.0 and adjusted the canonicalization method, effectively keeping it constant throughout the whole history, by changing MPS.canonical_form to call one of the following methods explicitly

  1. For canonical_form_infinite1, 09dfb47c introduces a bug. It does not break canonical_form_infinite1 though, it just causes it to be called though.
  2. For canonical_form_infinite2 with default args, 1dd62aa tightens the default tol from 1e-12 to 1e-15, which results in an infinite loop.
  3. For canonical_form_infinite2(tol=1e-12), 43b2d2c introduces a bug

Left to do:

jhauschild commented 4 months ago

I'm no longer convinced the issue is (only) in psi.canonical_form/Arnoldi. There's also an issue in the how we re-use the environment in engine.init_env()
https://github.com/tenpy/tenpy/blob/f3d59c6287f6b76dc0b191112064ee4529adfc40/tenpy/algorithms/mps_common.py#L256 This re-uses environments, assuming that they fit together with self.psi However, in dmrg._canonicalize(), we've set self._resume_psi = self.psi.copy() to have a version compatible with the existing environments, while modifying self.psi such that it no longer fits with the environments - see 33237b75a This commit claims to still have fixed #99 and #133 , and indeed it makes sure that incompatible charges in the MPO environment are handled - but in the tfi_phase_transition.py discussed in this thread, we don't use charge conservation at all, so there are no incompatible charges, while psi.canonical_from() still applies non-trivial unitaries on the virtual legs.

Really, we should push to get proper MPO environment (re)initialization from #295

jhauschild commented 4 months ago

I think we've been a bit misguided assuming that the issue is in psi.canonical_form.

Digging further, I tried a bunch of different combinations for a number of parameters. I tried looking at the update stats of the energy (error compared to final value) during DMRG, and often found that it doesn't go down smoothly. Example: example_E_going_up.pdf In the table below, the second column indicats whether I found such behavior or not. To be clear, I ran the tfi_phase_transition.py example, with

resume_data = None if reuse_env else {'init_env_data': {}}
engine.init_env(model=M, resume_data=resume_data)

form_{} indicates whether I used psi.canonical_form_infinite 1 or 2 - for 2 with tol=1.e-12.

smooth exp vals E goes down filename
N N data_tfi_form_1_update_env_0_reuse_env_0_svd_min_1e-10_norm_tol_1e-05.pkl
Y N data_tfi_form_1_update_env_0_reuse_env_0_svd_min_1e-10_norm_tol_None.pkl
Y mostly data_tfi_form_1_update_env_0_reuse_env_0_svd_min_1e-06_norm_tol_None.pkl
Y Y data_tfi_form_1_update_env_5_reuse_env_0_svd_min_1e-07_norm_tol_1e-05.pkl
N N data_tfi_form_1_update_env_5_reuse_env_0_svd_min_1e-10_norm_tol_1e-05.pkl
g=1 not just g=1 not data_tfi_form_1_update_env_5_reuse_env_1_svd_min_1e-07_norm_tol_1e-05.pkl
N N data_tfi_form_1_update_env_5_reuse_env_1_svd_min_1e-10_norm_tol_1e-05.pkl
Y Y data_tfi_form_2_update_env_0_reuse_env_0_svd_min_1e-06_norm_tol_1e-05.pkl
N N data_tfi_form_2_update_env_0_reuse_env_0_svd_min_1e-10_norm_tol_1e-05.pkl
Y Y data_tfi_form_2_update_env_0_reuse_env_1_svd_min_1e-07_norm_tol_1e-05.pkl
Y just g=0.99 not data_tfi_form_2_update_env_0_reuse_env_1_svd_min_1e-07_norm_tol_None.pkl
almost mostly data_tfi_form_2_update_env_0_reuse_env_0_svd_min_1e-10_norm_tol_None.pkl
N N data_tfi_form_2_update_env_5_reuse_env_0_svd_min_1e-10_norm_tol_1e-05.pkl
N N data_tfi_form_2_update_env_5_reuse_env_1_svd_min_1e-10_norm_tol_1e-05.pkl

Looking at the different parameter choices, whether it works seems mostly correlated with svd_min above/below half machine precision, and not so much with reuse_env. So the easy fix seems to be to set svd_min <= 1.e-7.

292 might help to fix this.