tenpy / tenpy

Tensor Network Python (TeNPy)

https://github.com/tenpy/tenpy

GNU General Public License v3.0

344 stars 124 forks source link

Alternative energy calculation quantum phase transition of transverse field Ising model example #317

Open DilhanM opened 7 months ago

DilhanM commented 7 months ago

On this advanced example, the energy of the DMRG ground state is not smooth across the critical point, as expected.

However, by calculating the energy as E0 = np.mean(M.bond_energies(psi))) I am able to reproduce a smooth energy surface across the critical point.

Is this calculating an energy different from the DMRG ground state energy, and if not, what is the reason behind this discrepancy?

Jakob-Unfried commented 4 months ago

This is the result of the current implementation with v0.11 tenpy:

Screenshot 2024-03-13 at 17 10 27

Jakob-Unfried commented 4 months ago

Similar for v0.10

Screenshot 2024-03-13 at 17 41 03

Jakob-Unfried commented 4 months ago

But it gives smooth results at v0.9...

Screenshot 2024-03-13 at 17 46 19

Jakob-Unfried commented 4 months ago

So i guess some commit between 0.9 and 0.10 introduced a problem

Today I learned about git bisect, highly recommend

Jakob-Unfried commented 4 months ago

Will update this message while I track down the bug...

There seem to be two different problems as I go through the history:

In https://github.com/tenpy/tenpy/commit/1dd62aa8eb5fe31d741cf7fa1114bb4803d0d7ed, the default tolerance for MPS.canonical_form_infinite2 was made more strict. That causes the loop in _canonical_form_right_orthogonalize to never terminate in the example, close to criticality. That is not the cause of this bug though, as at some point later, DMRG switched to using MPS.canonical_form_infinite1, since that is now the default in MPS.canonical_form.
- [ ] However, we should consider terminating the loop after some large number of iterations (say 10000?) and issuing a warning that the tolerance might be too strict, and the bug appears then, I suppose
Will now look into how/why MPS.canonical_form_infinite1 fails.

[ ] There is currently no way to control these parameters (which implementation, and e.g. the tolerance for MPS.canonical_form_infinite2. Should we support config options for that in the sweep?
[ ] The example is missing the understood_infinite=True flag for the overlap
[ ] The iMPS warning should use stacklevel. check others too

Jakob-Unfried commented 4 months ago

@DilhanM You are correct by the way that they example is not behaving as expected. Thanks for reporting it!

Jakob-Unfried commented 4 months ago

I get ok results if I modify the current main branch to use canonical_form_infinite2 and set its tol=1e-12;

Screenshot 2024-03-14 at 12 47 14

But there still seems to be some bad convergence in the Sz expval

This gets better by allowing more DMRG sweeps;

Screenshot 2024-03-14 at 13 14 07

Jakob-Unfried commented 4 months ago

Did another experiment to find what changed the expectation value Sz and correlation function SxSx to develop these spikes; Setup:

do git bisect. At each commit, adjust dmrg to do self.psi.canonical_form_infinite2(tol=1e-12) instead of self.psi.canonical_form().
Find the commit that introduced the spikes.

Result: The spikes are introduced by 43b2d2c19fcfb5ec7a1ca3e73503dc8c8bf1ebee

It seems that after that we also get UserWarning: SVD with lapack_driver 'gesdd' failed. Use backup 'gesvd' at the outlier g values.

Jakob-Unfried commented 4 months ago

Going back through the history there are multiple problems that surface in this example, which I have finally untangled. The main challenge in identifying whats going on is that the default method for canonicalizing infinite MPS, and its default arguments have changed multiple times throughout the history. For the following three canonicalization methods, I bisected the history between the current main and v0.9.0 and adjusted the canonicalization method, effectively keeping it constant throughout the whole history, by changing MPS.canonical_form to call one of the following methods explicitly

For canonical_form_infinite1, 09dfb47c introduces a bug. It does not break canonical_form_infinite1 though, it just causes it to be called though.
For canonical_form_infinite2 with default args, 1dd62aa tightens the default tol from 1e-12 to 1e-15, which results in an infinite loop.
For canonical_form_infinite2(tol=1e-12), 43b2d2c introduces a bug

Left to do:

[ ] Identify what went wrong at 1, fix it, write a test that would have detected it. See #368
~Regarding 2, I would propose to terminate the loops in _canonical_form_left_orthogonalize and _canonical_form_right_orthogonalize after some large number of iterations (say 10000 ?), maybe configurable through an option? And issue an informative warning to maybe relax the tolerance but proceed with care~ moved to #370
[ ] Identify what went wrong at 3, fix it, write a test that would have detected it. This is probably #372
~There is currently no high-level way to control these parameters (which implementation, and e.g. the tolerance for MPS.canonical_form_infinite2. Should we support config options for that? e.g. in cfg:Algorithm? or a new cfg:CanonicalForm as subconfig for cfg:Algorithm? Or rather MPS config, containing truncation and these things, that algorithms can , but dont have to override?~ moved to #371
[x] The example is missing the understood_infinite=True flag for the overlap
[x] should check the other examples for nonsensical results

jhauschild commented 4 months ago

I'm no longer convinced the issue is (only) in psi.canonical_form/Arnoldi. There's also an issue in the how we re-use the environment in engine.init_env()
https://github.com/tenpy/tenpy/blob/f3d59c6287f6b76dc0b191112064ee4529adfc40/tenpy/algorithms/mps_common.py#L256 This re-uses environments, assuming that they fit together with self.psi However, in dmrg._canonicalize(), we've set self._resume_psi = self.psi.copy() to have a version compatible with the existing environments, while modifying self.psi such that it no longer fits with the environments - see 33237b75a This commit claims to still have fixed #99 and #133 , and indeed it makes sure that incompatible charges in the MPO environment are handled - but in the tfi_phase_transition.py discussed in this thread, we don't use charge conservation at all, so there are no incompatible charges, while psi.canonical_from() still applies non-trivial unitaries on the virtual legs.

Really, we should push to get proper MPO environment (re)initialization from #295

jhauschild commented 4 months ago

I think we've been a bit misguided assuming that the issue is in psi.canonical_form.

Digging further, I tried a bunch of different combinations for a number of parameters. I tried looking at the update stats of the energy (error compared to final value) during DMRG, and often found that it doesn't go down smoothly. Example: example_E_going_up.pdf In the table below, the second column indicats whether I found such behavior or not. To be clear, I ran the tfi_phase_transition.py example, with

resume_data = None if reuse_env else {'init_env_data': {}}
engine.init_env(model=M, resume_data=resume_data)

form_{} indicates whether I used psi.canonical_form_infinite 1 or 2 - for 2 with tol=1.e-12.

smooth exp vals	E goes down	filename
N	N	data_tfi_form_1_update_env_0_reuse_env_0_svd_min_1e-10_norm_tol_1e-05.pkl
Y	N	data_tfi_form_1_update_env_0_reuse_env_0_svd_min_1e-10_norm_tol_None.pkl
Y	mostly	data_tfi_form_1_update_env_0_reuse_env_0_svd_min_1e-06_norm_tol_None.pkl
Y	Y	data_tfi_form_1_update_env_5_reuse_env_0_svd_min_1e-07_norm_tol_1e-05.pkl
N	N	data_tfi_form_1_update_env_5_reuse_env_0_svd_min_1e-10_norm_tol_1e-05.pkl
g=1 not	just g=1 not	data_tfi_form_1_update_env_5_reuse_env_1_svd_min_1e-07_norm_tol_1e-05.pkl
N	N	data_tfi_form_1_update_env_5_reuse_env_1_svd_min_1e-10_norm_tol_1e-05.pkl
Y	Y	data_tfi_form_2_update_env_0_reuse_env_0_svd_min_1e-06_norm_tol_1e-05.pkl
N	N	data_tfi_form_2_update_env_0_reuse_env_0_svd_min_1e-10_norm_tol_1e-05.pkl
Y	Y	data_tfi_form_2_update_env_0_reuse_env_1_svd_min_1e-07_norm_tol_1e-05.pkl
Y	just g=0.99 not	data_tfi_form_2_update_env_0_reuse_env_1_svd_min_1e-07_norm_tol_None.pkl
almost	mostly	data_tfi_form_2_update_env_0_reuse_env_0_svd_min_1e-10_norm_tol_None.pkl
N	N	data_tfi_form_2_update_env_5_reuse_env_0_svd_min_1e-10_norm_tol_1e-05.pkl
N	N	data_tfi_form_2_update_env_5_reuse_env_1_svd_min_1e-10_norm_tol_1e-05.pkl

Looking at the different parameter choices, whether it works seems mostly correlated with svd_min above/below half machine precision, and not so much with reuse_env. So the easy fix seems to be to set svd_min <= 1.e-7.

292 might help to fix this.