Open DilhanM opened 7 months ago
This is the result of the current implementation with v0.11 tenpy:
Similar for v0.10
But it gives smooth results at v0.9...
So i guess some commit between 0.9 and 0.10 introduced a problem
Today I learned about git bisect
, highly recommend
Will update this message while I track down the bug...
There seem to be two different problems as I go through the history:
In https://github.com/tenpy/tenpy/commit/1dd62aa8eb5fe31d741cf7fa1114bb4803d0d7ed, the default tolerance for MPS.canonical_form_infinite2
was made more strict. That causes the loop in _canonical_form_right_orthogonalize
to never terminate in the example, close to criticality. That is not the cause of this bug though, as at some point later, DMRG switched to using MPS.canonical_form_infinite1
, since that is now the default in MPS.canonical_form
.
Will now look into how/why MPS.canonical_form_infinite1
fails.
[ ] There is currently no way to control these parameters (which implementation, and e.g. the tolerance for MPS.canonical_form_infinite2
. Should we support config options for that in the sweep?
[ ] The example is missing the understood_infinite=True
flag for the overlap
[ ] The iMPS warning should use stacklevel. check others too
@DilhanM You are correct by the way that they example is not behaving as expected. Thanks for reporting it!
I get ok results if I modify the current main branch to use canonical_form_infinite2
and set its tol=1e-12
;
But there still seems to be some bad convergence in the Sz expval
This gets better by allowing more DMRG sweeps;
Did another experiment to find what changed the expectation value Sz and correlation function SxSx to develop these spikes; Setup:
git bisect
. At each commit, adjust dmrg to do self.psi.canonical_form_infinite2(tol=1e-12)
instead of self.psi.canonical_form()
.Result: The spikes are introduced by 43b2d2c19fcfb5ec7a1ca3e73503dc8c8bf1ebee
It seems that after that we also get UserWarning: SVD with lapack_driver 'gesdd' failed. Use backup 'gesvd'
at the outlier g values.
Going back through the history there are multiple problems that surface in this example, which I have finally untangled.
The main challenge in identifying whats going on is that the default method for canonicalizing infinite MPS, and its default arguments have changed multiple times throughout the history.
For the following three canonicalization methods, I bisected the history between the current main and v0.9.0 and adjusted the canonicalization method, effectively keeping it constant throughout the whole history, by changing MPS.canonical_form
to call one of the following methods explicitly
canonical_form_infinite1
, 09dfb47c introduces a bug. It does not break canonical_form_infinite1
though, it just causes it to be called though.canonical_form_infinite2
with default args, 1dd62aa tightens the default tol
from 1e-12
to 1e-15
, which results in an infinite loop.canonical_form_infinite2(tol=1e-12)
, 43b2d2c introduces a bugLeft to do:
_canonical_form_left_orthogonalize
and _canonical_form_right_orthogonalize
after some large number of iterations (say 10000 ?), maybe configurable through an option? And issue an informative warning to maybe relax the tolerance but proceed with care~ moved to #370cfg:Algorithm
? or a new cfg:CanonicalForm
as subconfig for cfg:Algorithm
?
Or rather MPS config, containing truncation and these things, that algorithms can , but dont have to override?~ moved to #371understood_infinite=True
flag for the overlapI'm no longer convinced the issue is (only) in psi.canonical_form/Arnoldi.
There's also an issue in the how we re-use the environment in engine.init_env()
https://github.com/tenpy/tenpy/blob/f3d59c6287f6b76dc0b191112064ee4529adfc40/tenpy/algorithms/mps_common.py#L256
This re-uses environments, assuming that they fit together with self.psi
However, in dmrg._canonicalize(), we've set self._resume_psi = self.psi.copy()
to have a version compatible with the existing environments, while modifying self.psi such that it no longer fits with the environments - see 33237b75a
This commit claims to still have fixed #99 and #133 , and indeed it makes sure that incompatible charges in the MPO environment are handled - but in the tfi_phase_transition.py
discussed in this thread, we don't use charge conservation at all, so there are no incompatible charges, while psi.canonical_from()
still applies non-trivial unitaries on the virtual legs.
Really, we should push to get proper MPO environment (re)initialization from #295
I think we've been a bit misguided assuming that the issue is in psi.canonical_form.
Digging further, I tried a bunch of different combinations for a number of parameters. I tried looking at the update stats of the energy (error compared to final value) during DMRG, and often found that it doesn't go down smoothly. Example: example_E_going_up.pdf In the table below, the second column indicats whether I found such behavior or not. To be clear, I ran the tfi_phase_transition.py example, with
resume_data = None if reuse_env else {'init_env_data': {}}
engine.init_env(model=M, resume_data=resume_data)
form_{}
indicates whether I used psi.canonical_form_infinite
1 or 2 - for 2 with tol=1.e-12.
smooth exp vals | E goes down | filename |
---|---|---|
N | N | data_tfi_form_1_update_env_0_reuse_env_0_svd_min_1e-10_norm_tol_1e-05.pkl |
Y | N | data_tfi_form_1_update_env_0_reuse_env_0_svd_min_1e-10_norm_tol_None.pkl |
Y | mostly | data_tfi_form_1_update_env_0_reuse_env_0_svd_min_1e-06_norm_tol_None.pkl |
Y | Y | data_tfi_form_1_update_env_5_reuse_env_0_svd_min_1e-07_norm_tol_1e-05.pkl |
N | N | data_tfi_form_1_update_env_5_reuse_env_0_svd_min_1e-10_norm_tol_1e-05.pkl |
g=1 not | just g=1 not | data_tfi_form_1_update_env_5_reuse_env_1_svd_min_1e-07_norm_tol_1e-05.pkl |
N | N | data_tfi_form_1_update_env_5_reuse_env_1_svd_min_1e-10_norm_tol_1e-05.pkl |
Y | Y | data_tfi_form_2_update_env_0_reuse_env_0_svd_min_1e-06_norm_tol_1e-05.pkl |
N | N | data_tfi_form_2_update_env_0_reuse_env_0_svd_min_1e-10_norm_tol_1e-05.pkl |
Y | Y | data_tfi_form_2_update_env_0_reuse_env_1_svd_min_1e-07_norm_tol_1e-05.pkl |
Y | just g=0.99 not | data_tfi_form_2_update_env_0_reuse_env_1_svd_min_1e-07_norm_tol_None.pkl |
almost | mostly | data_tfi_form_2_update_env_0_reuse_env_0_svd_min_1e-10_norm_tol_None.pkl |
N | N | data_tfi_form_2_update_env_5_reuse_env_0_svd_min_1e-10_norm_tol_1e-05.pkl |
N | N | data_tfi_form_2_update_env_5_reuse_env_1_svd_min_1e-10_norm_tol_1e-05.pkl |
Looking at the different parameter choices, whether it works seems mostly correlated with svd_min
above/below half machine precision, and not so much with reuse_env
.
So the easy fix seems to be to set svd_min
<= 1.e-7.
On this advanced example, the energy of the DMRG ground state is not smooth across the critical point, as expected.
However, by calculating the energy as
E0 = np.mean(M.bond_energies(psi)))
I am able to reproduce a smooth energy surface across the critical point.Is this calculating an energy different from the DMRG ground state energy, and if not, what is the reason behind this discrepancy?