Issues with the examples

mretegan commented 11 months ago

I have a few (and different) issues when running the examples that come with the code. For example, in the case of the STO example, I get the following error:

################################################################################
Entering DFT stage
Fri Nov 24 15:47:24 2023
Need SCF run
k2_2_2q0.125000_0.250000_0.375000 is new
Will run k2_2_2q0.125000_0.250000_0.375000
k2_2_2q0.125000_0.250000_0.375000 is new
Will run k2_2_2q0.125000_0.250000_0.375000
Condensing two runs
Done parsing input for DFT stage
local
Testing parallel QE execution
srun /cvmfs/hpc.esrf.fr/software/packages/ubuntu20.04/x86_64/ocean/3.1.0/ocean/bin/pw.x  -inp scf.in > test.out 2>&1
Memory estimate 10.69
Min pool size 0.9
 N procs     Pool     Band        Cost
       1        1        1       10.3766
Testing parallel QE execution
srun /cvmfs/hpc.esrf.fr/software/packages/ubuntu20.04/x86_64/ocean/3.1.0/ocean/bin/pw.x  -inp scf.in > test.out 2>&1
Memory estimate 10.69
Min pool size 0.9
 N procs     Pool     Band        Cost
       1        1        1       10.3766
1  1
srun /cvmfs/hpc.esrf.fr/software/packages/ubuntu20.04/x86_64/ocean/3.1.0/ocean/bin/pw.x  -npool 1  -inp scf.in > scf.out 2>&1
QE62!!
0
SCF stage complete, total energy: -107.78127615502
Exporting density from SCF
########  QE convert
DENPOT VERSION 6.2
srun /cvmfs/hpc.esrf.fr/software/packages/ubuntu20.04/x86_64/ocean/3.1.0/ocean/bin/pp.x  -npool 1  -inp pp.in > pp.out 2>&1
1.198685191E-04
 Plan using FFTW:          50          50          50      125000
Density export complete
Exporting potential from SCF
DENPOT VERSION 6.2
srun /cvmfs/hpc.esrf.fr/software/packages/ubuntu20.04/x86_64/ocean/3.1.0/ocean/bin/pp.x  -npool 1  -inp pp2.in > pp2.out 2>&1
QE fix 1 system.pot
POT: 0
7.985091786E+00
Potential export complete
k2_2_2q0.125000_0.250000_0.375000:  8000000
Running NSCF run for: k2_2_2q0.125000_0.250000_0.375000
k2_2_2q0.125000_0.250000_0.375000
../Out/system.save   Out/system.save
/home/esrf/retegan/Code/ocean/tests/STO/DFT/k2_2_2q0.125000_0.250000_0.375000
local
Testing parallel QE execution
srun /cvmfs/hpc.esrf.fr/software/packages/ubuntu20.04/x86_64/ocean/3.1.0/ocean/bin/pw.x  -inp nscf.in > test.out 2>&1
Memory estimate 31.49
Min pool size 0.9
 N procs     Pool     Band        Cost
       1        1        1       8.05931
../Out/system.save   Out/system.save
/home/esrf/retegan/Code/ocean/tests/STO/DFT/k2_2_2q0.125000_0.250000_0.375000
Testing parallel QE execution
srun /cvmfs/hpc.esrf.fr/software/packages/ubuntu20.04/x86_64/ocean/3.1.0/ocean/bin/pw.x  -inp nscf.in > test.out 2>&1
Memory estimate 31.49
Min pool size 0.9
 N procs     Pool     Band        Cost
       1        1        1       8.05931
../Out/system.save   Out/system.save
/home/esrf/retegan/Code/ocean/tests/STO/DFT/k2_2_2q0.125000_0.250000_0.375000
1  1
srun /cvmfs/hpc.esrf.fr/software/packages/ubuntu20.04/x86_64/ocean/3.1.0/ocean/bin/pw.x  -npool 1 -inp nscf.in > nscf.out 2>&1
QE62!!
Wrong number of spins and k-points!  Expected: 8 . Found: 10
DFT Stage Failed

Do you require a specific version of Quantum ESPRESSO (I am using version 6.2.0)?

mretegan commented 11 months ago

I forgot to mention that I using version 3.1.0 of OCEAN.

jtv3 commented 11 months ago

For the DFT stage and an XAS calculation, OCEAN runs up to 3 QE runs. The first is the SCF run, which uses the input file DFT/scf.in and makes the output file DFT/scf.out, and this seems to have run correctly. The second and third are NSCF runs at set k-point grids for the screening calculation and the BSE final states. In the case of the STO example these are merged because they have the same k-point grid ("Condensing two runs"). The NSCF runs are carried out in sub-directories of DFT named by their k-point grid and k-offset, in this case k2_2_2q0.125000_0.250000_0.375000. (The k-offset is because the calculations converge faster if you move away from the Gamma point.)

From the output you provided, the last thing OCEAN tried to do was this NSCF run "srun /cvmfs/hpc.esrf.fr/software/packages/ubuntu20.04/x86_64/ocean/3.1.0/ocean/bin/pw.x -npool 1 -inp nscf.in > nscf.out 2>&1". If you go into the directory DFT/k2_2_2q0.125000_0.250000_0.375000 you should be able to look at the QE output file nscf.out and see what error was reported.

mretegan commented 11 months ago

Thank you for your help. Indeed, there is an issue with convergence. You can find an archive with all files here https://cloud.esrf.fr/s/Syffb6DRz5kRqYy. Do you see any obvious issues?

     PseudoPot. # 3 for O  read from file:
     /home/esrf/retegan/Code/ocean/tests/STO/Common/psp/08-o.lda.fhi.UPF
     MD5 check sum: c35f640a64c00feec7ce3b63cda5293b
     Pseudo is Norm-conserving, Zval =  6.0
     Generated by new atomic code, or converted to UPF format
     Using radial grid of  473 points,  3 beta functions with:
                l(1) =   0
                l(2) =   1
                l(3) =   3

     atomic species   valence    mass     pseudopotential
        Sr             2.00    87.62000     Sr( 1.00)
        Ti            12.00    47.86700     Ti( 1.00)
        O              6.00    15.99940     O ( 1.00)

     No symmetry found

   Cartesian axes

     site n.     atom                  positions (alat units)
         1           Sr  tau(   1) = (   0.0203729   0.0203729   0.0306604  )
         2           Ti  tau(   2) = (   0.5185862   0.5185862   0.5286659  )
         3           O   tau(   3) = (  -0.0113162   0.4907822   0.4907486  )
         4           O   tau(   4) = (   0.4907822  -0.0113162   0.4907486  )
         5           O   tau(   5) = (   0.4915749   0.4915749  -0.0168630  )

     number of k points=     8
                       cart. coord. in units 2pi/alat
        k(    1) = (   0.0625000   0.1250000   0.1857824), wk =   0.2500000
        k(    2) = (   0.0625000   0.1250000   0.6812020), wk =   0.2500000
        k(    3) = (   0.0625000   0.6250000   0.1857824), wk =   0.2500000
        k(    4) = (   0.0625000   0.6250000   0.6812020), wk =   0.2500000
        k(    5) = (   0.5625000   0.1250000   0.1857824), wk =   0.2500000
        k(    6) = (   0.5625000   0.1250000   0.6812020), wk =   0.2500000
        k(    7) = (   0.5625000   0.6250000   0.1857824), wk =   0.2500000
        k(    8) = (   0.5625000   0.6250000   0.6812020), wk =   0.2500000

     Dense  grid:    58917 G-vectors     FFT dimensions: (  50,  50,  50)

     Estimated max dynamical RAM per process >      31.49 MB

     Estimated total dynamical RAM >     314.93 MB

     The potential is recalculated from file :
     Out/system.save/charge-density.dat

     Starting wfc are   73 randomized atomic wfcs +   27 random wfc

     Band Structure Calculation
     Davidson diagonalization with overlap

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     Error in routine c_bands (1):
     too many bands are not converged
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

     stopping ...
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD

jtv3 commented 11 months ago

Quick fix

Probably the quickest fix is to change the diagonalization that QE is using. Since you are using QE < 6.5, that'll need to be 'cg'. You can add the following two lines to your input file and see if that fixes it: dft.screen.diagonalization cg dft.bse.diagonalization cg

Additional information

The STO example is just something quick and the pseudopotentials (especially the Ti) aren't great.
Newer versions of QE have additional parallelization options, and starting with QE 6.5 OCEAN will try and use the 'ppcg' option. For yours OCEAN is trying the Davidson method. The regular conjugate-gradient (cg) is going to be slow, but it should be fine for this example.
You might also have success increasing the dimension of the Davidson (pw.x flag diago_david_ndim). This can be done by setting dft.verbatim.qe.electrons { diago_david_ndim = 3 } in your ocean input file. This might fail with a different error. The QE Davidson implementation doesn't properly discard trial vectors that are too similar to existing vectors and can crash out.

times-software / OCEAN

Issues with the examples #324

Quick fix

Additional information