issp-center-dev / DCore

DMFT software for CORrelated Electrons
https://issp-center-dev.github.io/DCore/index.html
Other
44 stars 14 forks source link

Problem on running dcore DMFT loop #128

Open shunhongzhang opened 2 years ago

shunhongzhang commented 2 years ago

Dear developers,

I am a beginner of dcore and I came across the following problem. First I install dcore, triqs, triqs_dft_tools properly. (No error prompt and all modules can be imported) Then I copy the example of 2D Hubbard model from the documentation web (dmft_square_pomerol.ini) No problem in running dcore_pre dmft_square_pomerol.ini But when I run the True DMFT loop with dcore dmft_square_pomerol.ini --np 1 it stops with the prompt


raise RuntimeError("Error occurred while executing MPI program! Output messages may be found in {}!".format(os.path.abspath(output_file.name)))

RuntimeError: Error occurred while executing MPI program! Output messages may be found in /public/home/workspace/many-body/dcore/work/sumkdft_Gloc/output!

I saw the following message in the output file


**File "/public/home/anaconda3/envs/wanbe/lib/python3.8/site-packages/triqs/gf/gf.py", line 135, in delegate assert isinstance(indices, (type(None), list, GfIndices)), "Type of indices incorrect : should be None, Gfindices, list of str, or list of list of str" AssertionError: Type of indices incorrect : should be None, Gfindices, list of str, or list of list of str Unexpected error: Type of indices incorrect : should be None, Gfindices, list of str, or list of list of str**

Could you help me figure out what does this message mean and how can I solve it? Thank you very much in advance!

Best regards, SH

shinaoka commented 2 years ago

Hi, what is the version of TRIQS? Can I have a look at the full error message?

gpwahl commented 2 years ago

Hi, I encountered the same error message when trying to run the main loop of dcore. Is there a recommendation how to resolve this issue? This is using triqs-3.1.x and running the example of srvo3 (using the alps/ct-hyb solver). Happy to send log files if useful.

The error message is the same as above (only last python error quoted here): File "/.../triqs/lib/python3.6/site-packages/triqs/gf/gf.py", line 135, in delegate assert isinstance(indices, (type(None), list, GfIndices)), "Type of indices incorrect : should be None, Gfindices, list of str, or list of list of str" AssertionError: Type of indices incorrect : should be None, Gfindices, list of str, or list of list of str

shinaoka commented 2 years ago

Thank you for the report. I will work on it. the alps/ct-hyb solver means this one? https://github.com/ALPSCore/CT-HYB

gpwahl commented 2 years ago

yes, that's correct. It's initialised in the .ini file as follows: [impurity_solver] name = ALPS/cthyb timelimit{int} = 90 exec_path{str} = /.../ct-hyb/bin/hybmat compiling the impurity solver and running its tests worked without any problems.

shinaoka commented 2 years ago

I somehow failed to reproduce the issue:

>>> from triqs.version import *
>>> show_version()

You are using the TRIQS library version 3.1.1

>>> show_git_hash()

You are using the git hash 1da1cb4ed217dea552a9b4147e4ecdc67d0fd94c

>>> import dcore
>>> dcore.__version__
'v3.3.1'

I'd appreciate if you describe how you installed triqs and DCore. There may be something to do with Anaconda, I guess.

Could you install a development version and set the following environment variable? This forces DCore to use the internally bundled version of TRIQS.

> pip install git+https://github.com/issp-center-dev/DCore.git@develop
> export DCORE_TRIQS_COMPAT=1
> dcore...
gpwahl commented 2 years ago

Thanks a lot, installing the development version with the environment variable worked fine. Is there any disadvantage in using it like that?

In case it is of any use, if I don't set DCORE_TRIQS_COMPAT to 1, I get the following error messages in the output file (replicated once for each MPI task), which is the same as above: Traceback (most recent call last): File "/home/.../.local/lib/python3.6/site-packages/dcore/sumkdft_workers/mpi_main.py", line 20, in runner.run() File "/home/.../.local/lib/python3.6/site-packages/dcore/sumkdft_workers/gloc_worker.py", line 38, in run Gloc = sk.extract_G_loc(with_dc=with_dc) File "/home/.../.local/lib/python3.6/site-packages/dcore/sumkdft_opt.py", line 269, in extract_G_loc make_copies=False) for ish in range(self.n_inequiv_shells)] File "/home/.../.local/lib/python3.6/site-packages/dcore/sumkdft_opt.py", line 269, in make_copies=False) for ish in range(self.n_inequiv_shells)] File "/home/.../.local/lib/python3.6/site-packages/dcore/sumkdft_opt.py", line 268, in G_loc_inequiv = [BlockGf(name_block_generator=[(block, GfImFreq(indices=inner, mesh=G_loc[0].mesh)) for block, inner in self.gf_struct_solver[ish].items()], File "/.../triqs/lib/python3.6/site-packages/triqs/gf/backwd_compat/gf_imfreq.py", line 73, in init delegate(self, kw) File "/.../triqs/lib/python3.6/site-packages/triqs/gf/backwd_compat/gf_imfreq.py", line 71, in delegate name = name) File "/.../triqs/lib/python3.6/site-packages/triqs/gf/gf.py", line 192, in init delegate(self, kw) File "/.../triqs/lib/python3.6/site-packages/triqs/gf/gf.py", line 135, in delegate assert isinstance(indices, (type(None), list, GfIndices)), "Type of indices incorrect : should be None, Gfindices, list of str, or list of list of str" AssertionError: Type of indices incorrect : should be None, Gfindices, list of str, or list of list of str

For the triqs installation, this is pulling it from the GitHub repository and compiling it myself. The compilation goes through fine and passes all tests for triqs (i.e. make test shows no failed tests).

shinaoka commented 2 years ago

Is there any disadvantage in using it like that?

As far as I tested myself, there is no disadvantage.

For the triqs installation, this is pulling it from the GitHub repository and compiling it myself. The compilation goes through fine and passes all tests for triqs (i.e. make test shows no failed tests).

This is weird... I have no idea.

an-karpov commented 1 year ago

Hi. I have same problem with examples here: https://issp-center-dev.github.io/DCore/master/impuritysolvers/triqs_cthyb/cthyb.html

but this doesn't help me

> pip install git+https://github.com/issp-center-dev/DCore.git@develop
> export DCORE_TRIQS_COMPAT=1
> dcore...

The text of error: Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/dcore/sumkdft_workers/mpi_main.py", line 20, in runner.run() File "/usr/local/lib/python3.8/dist-packages/dcore/sumkdft_workers/gloc_worker.py", line 38, in run Gloc = sk.extract_G_loc(with_dc=with_dc) File "/usr/local/lib/python3.8/dist-packages/dcore/sumkdft_opt.py", line 268, in extract_G_loc G_loc_inequiv = [BlockGf(name_block_generator=[(block, GfImFreq(indices=inner, mesh=G_loc[0].mesh)) for block, inner in self.gf_struct_solver[ish].items()], File "/usr/local/lib/python3.8/dist-packages/dcore/sumkdft_opt.py", line 268, in G_loc_inequiv = [BlockGf(name_block_generator=[(block, GfImFreq(indices=inner, mesh=G_loc[0].mesh)) for block, inner in self.gf_struct_solver[ish].items()], File "/usr/local/lib/python3.8/dist-packages/dcore/sumkdft_opt.py", line 268, in G_loc_inequiv = [BlockGf(name_block_generator=[(block, GfImFreq(indices=inner, mesh=G_loc[0].mesh)) for block, inner in self.gf_struct_solver[ish].items()], File "/usr/lib/python3.8/dist-packages/triqs/gf/backwd_compat/gf_imfreq.py", line 73, in init delegate(self, kw) File "/usr/lib/python3.8/dist-packages/triqs/gf/backwd_compat/gf_imfreq.py", line 66, in delegate super(GfImFreq, self).init( File "/usr/lib/python3.8/dist-packages/triqs/gf/gf.py", line 192, in init delegate(self, kw) File "/usr/lib/python3.8/dist-packages/triqs/gf/gf.py", line 135, in delegate assert isinstance(indices, (type(None), list, GfIndices)), "Type of indices incorrect : should be None, Gfindices, list of str, or list of list of str" AssertionError: Type of indices incorrect : should be None, Gfindices, list of str, or list of list of str

My versions of triqs and dcore


>>> show_version()

You are using the TRIQS library version 3.1.1

>>> show_git_hash()

You are using the git hash a8924f99d1805de71e3d9b778f1b8aebba03f2d7

>>> import dcore
>>> dcore.__version__
'v3.4.0+1.gb1ea136'```
shinaoka commented 1 year ago

@an-karpov Hi, how did you build the triqs/cthyb solver? The solver must be built with -DLocal_hamiltonian_is_complex=ON -DHybridisation_is_complex=ON. https://issp-center-dev.github.io/DCore/master/impuritysolvers/triqs_cthyb/cthyb.html

an-karpov commented 1 year ago

@an-karpov Hi, how did you build the triqs/cthyb solver? The solver must be built with -DLocal_hamiltonian_is_complex=ON -DHybridisation_is_complex=ON. https://issp-center-dev.github.io/DCore/master/impuritysolvers/triqs_cthyb/cthyb.html

I used the following command: sudo apt-get install -y triqs_cthyb

does it have to be installed via the sources?

shinaoka commented 1 year ago

does it have to be installed via the sources?

I think so. I am using my own ALPS/CT-HYB regularly.

Hanxiang-XU commented 1 year ago

Dear developers,

I try to run a DCore calculation on ISSP supercomputer System B (ohtaka) by MPI job using following impurity solver 'ALPS/cthyb'

[impurity_solver] name = ALPS/cthyb exec_path{str} = /home/issp/materiapps/intel/alps_cthyb/alps_cthyb-51cf6b6715ced29cb1f9c1266556691490a035b7-gcc/bin/hybmat timelimit{int} = 300

For the example of 2D Hubbard model, it works well. But when I apply it on a real material, the calculation is stopped by an error 'Thrown exception Acceptance_rate_global_shift was measured on only some of the MPI processes.' in 2nd iteration, which can be checked in dcore.log or output file in work/ directory. And in dcore.err, there is an error

File "/home/issp/materiapps/intel/triqs/triqs-3.0.0-0/lib/python3.8/site-packages/h5/archive.py", line 211, in getitem1 raise KeyError("Key %s does not exist."%key) KeyError: 'Key Sign does not exist.'

Operators of ISSP said they found no problems with System B and suggested me to report the error in here. Is it possible to solve this problem? If you need more information or error messages, please let me know.

shinaoka commented 1 year ago

The error message indicates that the total simulation time is too short or one Monte Carlo step took loo long to accumulate Monte Carlo results. Do you have more log messages from the solver?

Hanxiang-XU commented 1 year ago

Yes, there is, but I could not find the difference of log messages comparing with successful calculation.

In work/imp_shell0_ite2/output :

[LOG_CAT_MCAST] VMC Failed to join multicast, is_root 1. Unexpected event was received: event=13, str=RDMA_CM_EVENT_MULTICAST_ERROR, status=-22 [LOG_CAT_MCAST] MCAST rank=7: Error in initializing vmc communicator [LOG_CAT_P2P] Failed to create MCAST comm [LOG_CAT_MCAST] MCAST rank=2: Error in initializing vmc communicator [LOG_CAT_P2P] Failed to create MCAST comm [LOG_CAT_MCAST] MCAST rank=4: Error in initializing vmc communicator [LOG_CAT_P2P] Failed to create MCAST comm [LOG_CAT_MCAST] MCAST rank=5: Error in initializing vmc communicator [LOG_CAT_P2P] Failed to create MCAST comm [LOG_CAT_MCAST] MCAST rank=3: Error in initializing vmc communicator [LOG_CAT_P2P] Failed to create MCAST comm [LOG_CAT_MCAST] MCAST rank=6: Error in initializing vmc communicator [LOG_CAT_P2P] Failed to create MCAST comm [LOG_CAT_MCAST] MCAST rank=1: Error in initializing vmc communicator [LOG_CAT_P2P] Failed to create MCAST comm [LOG_CAT_MCAST] MCAST rank=0: Error in initializing vmc communicator [LOG_CAT_P2P] Failed to create MCAST comm Reading ./Uijkl.txt... Number of non-zero elements in U tensor is 140 Opening ./basis.txt... dim of Hilbert space 1024

of blocks 36

of sectors 36

Sector 0 : dim = 1, min energy = -2.0418, max energy = -2.0418 ... Sector 35 : dim = 1, min energy = 0, max energy = 0 Max eigen energy = 0 Min eigen energy = -26.0209 Throwing away high energy states... Sector 0 : dim = 1, min energy = -2.0418, max energy = -2.0418 ... Sector 35 : dim = 1, min energy = 0, max energy = 0 Max eigen energy = 0 Min eigen energy = -26.0209 Reference energy -26.0209 Dim of ket: sector 0 inner 1 outer 1 ... Dim of ket: sector 35 inner 1 outer 1 The following swap updates will be performed. Update #0 generated from template #0 flavor 0 to flavor 1 flavor 1 to flavor 0 flavor 2 to flavor 3 flavor 3 to flavor 2 flavor 4 to flavor 5 flavor 5 to flavor 4 flavor 6 to flavor 7 flavor 7 to flavor 6 flavor 8 to flavor 9 flavor 9 to flavor 8 Checking if the simulation is finished: 2 sec passed. Checking if the simulation is finished: 11 sec passed. Checking if the simulation is finished: 23 sec passed. Checking if the simulation is finished: 43 sec passed. Thermalization process done after 40 steps. The number of segments for sliding window update is 1. Perturbation orders (averaged over processes) are the following: flavor 0 0 ... flavor 9 0 Checking if the simulation is finished: 68 sec passed. ... Checking if the simulation is finished: 301 sec passed. Thrown exception Acceptance_rate_global_shift was measured on only some of the MPI processes.

Some logs are omitted by '...'

In dcore.err, there are all logs:

Traceback (most recent call last): File "/home/k0395/k039504/.local/bin/dcore", line 33, in sys.exit(load_entry_point('dcore', 'console_scripts', 'dcore')()) File "/home/k0395/k039504/dcore.src/src/dcore/dcore.py", line 111, in run dcore(args.path_input_files, int(args.np)) File "/home/k0395/k039504/dcore.src/src/dcore/dcore.py", line 58, in dcore solver.do_steps(max_step=params["control"]["max_step"]) File "/home/k0395/k039504/dcore.src/src/dcore/dmft_core.py", line 842, in do_steps new_Sigma_iw, new_Gimp_iw = self.solve_impurity_models(Gloc_iw_sh, iteration_number) File "/home/k0395/k039504/dcore.src/src/dcore/dmft_core.py", line 666, in solve_impurity_models r = solve_impurity_model(solver_name, self._solver_params, self._mpirun_command, File "/home/k0395/k039504/dcore.src/src/dcore/dmft_core.py", line 215, in solve_impurity_model sol.solve(rot, mpirun_command, s_params) File "/home/k0395/k039504/dcore.src/src/dcore/impurity_solvers/alps_cthyb.py", line 126, in solve self._solve_impl(rot, mpirun_command, None, params_kw) File "/home/k0395/k039504/dcore.src/src/dcore/impurity_solvers/alps_cthyb.py", line 291, in _solve_impl sign = f['Sign'] File "/home/issp/materiapps/intel/triqs/triqs-3.0.0-0/lib/python3.8/site-packages/h5/archive.py", line 205, in getitem return self.getitem1(key,self._reconstruct_python_objects) File "/home/issp/materiapps/intel/triqs/triqs-3.0.0-0/lib/python3.8/site-packages/h5/archive.py", line 211, in getitem1 raise KeyError("Key %s does not exist."%key) KeyError: 'Key Sign does not exist.'

shinaoka commented 1 year ago

Thank you. I need to check the omitted message. Could you upload the full log message from the QMC solver as a zip file?

Hanxiang-XU commented 1 year ago

Thank you very much! I just notice that I can upload files in the comments. These are files that were used to report error to ISSP including input files and output log files. I hope their are useful.

error_DCore_hybmat.zip

shinaoka commented 1 year ago

The simulation time seems to be too short. Could you increase the total simulation time?

Perturbation orders just before and after measurement steps are 26.4284 and 45.2751.

These two numbers should match roughly.

Hanxiang-XU commented 1 year ago

Thank you very much! It worked for a low precision, cheap calculation. I am trying another a little heavy calculation.