Optimization got stuck in reading stdin

lixinyuu commented 1 year ago

Hi Tian,

I encountered a problem that the optimization got stuck in POSITIONS: reading from stdin

The calculation did not quit and just waiting there; therefore I cannot identify which line is causing the problem. I guess this is an error according to https://gitlab.com/ase/ase/-/blob/master/ase/calculators/vasp/interactive.py#107

Do you have any suggestions on how to solve this?

Thanks

alchem0x2A commented 1 year ago

Our implementation of VaspInteractive is slightly different from the one in ase. We will try PR to ase's main branch after fixing some existing issues, but basically the ase version is broken.

Could you please provide an example how the optimization is done (and error messages)? Please note in the interactive mode all vasp internal optimization flags (IBRION=X) are disabled and needs external optimizer such as BFGS to continue.

lixinyuu commented 1 year ago

Thanks for explaining this.

The script I used is

import sys
import os
sys.path.append('/hpcfs/users/useruser/github/vasp-interactive/')

import numpy as np
from ase.build import molecule
from ase.optimize import BFGS
from vasp_interactive import VaspInteractive

atoms = molecule("CH4")
atoms.set_cell([10., 10., 10.])
atoms.pbc = True
noise = np.random.uniform(0,0.2,[5,3])
atoms.positions += noise
atoms.center()

from vasp_interactive import VaspInteractive
vasp_flags = {"ibrion": 2, 
                "nsw": 300, 
                "isif": 0, 
                "isym": 0, 
                "kpts": [1, 1, 1], 
                "lreal": "Auto", 
                "ediffg": -0.02, 
                "symprec": 1e-10, 
                "encut": 420.0, 
                "pp": "PBE", 
                "ivdw": 11, 
                "ispin": 2, 
                "lwave": False, 
                "lcharg": False, 
                "npar": 4}

vasp_flags["ibrion"] = -1
vasp_flags['ediffg'] = 0
if vasp_flags['ispin'] == 2:
    vasp_flags['lorbit'] = 10 # If spin calculation, lorbit>=10 is needed to get MAGMOM
vasp_flags['interactive'] = True
calc = VaspInteractive(directory="./", **vasp_flags)
with calc:
    atoms.calc = calc
    dyn = BFGS(atoms, logfile="relax.log", trajectory="all.traj")
    # Now ASE-BFGS controls the relaxation, not VASP
    dyn.run(fmax=0.05)

The script works well when directly running it in the terminal (therefore running in the login-in node). I can see my relax.log shows the following optimization process

      Step     Time          Energy         fmax
BFGS:    0 10:34:12      -23.801554        3.9878
BFGS:    1 10:34:13      -23.984245        1.3026
BFGS:    2 10:34:13      -24.015267        0.4445
BFGS:    3 10:34:13      -24.023838        0.1844
BFGS:    4 10:34:14      -24.026236        0.1034
BFGS:    5 10:34:14      -24.028249        0.0621
BFGS:    6 10:34:15      -24.028294        0.0315

The vasp.out is

Writing VASP input files
Starting VASP for initial step...
 running on   40 total cores
 distrk:  each k-point on   40 cores,    1 groups
 distr:  one band on   10 cores,    4 groups
 using from now: INCAR     
 vasp.5.4.4.18Apr17-6-g9f103f2a35 (build Aug 17 2021 15:41:03) complex          

 POSCAR found type information on POSCAR  C  H 
 POSCAR found :  2 types and       5 ions

 ----------------------------------------------------------------------------- 
|                                                                             |
|  ADVICE TO THIS USER RUNNING 'VASP/VAMP'   (HEAR YOUR MASTER'S VOICE ...):  |
|                                                                             |
|      You have a (more or less) 'small supercell' and for smaller cells      |
|      it is recommended  to use the reciprocal-space projection scheme!      |
|      The real space optimization is not  efficient for small cells and it   |
|      is also less accurate ...                                              |
|      Therefore set LREAL=.FALSE. in the  INCAR file                         |
|                                                                             |
 ----------------------------------------------------------------------------- 

 LDA part: xc-table for Pade appr. of Perdew
 POSCAR found type information on POSCAR  C  H 
 POSCAR found :  2 types and       5 ions
 POSCAR, INCAR and KPOINTS ok, starting setup
 FFT: planning ...
 WAVECAR not read
 entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
DAV:   1     0.612279168163E+02    0.61228E+02   -0.14029E+03    48   0.258E+02
DAV:   2    -0.127988118086E+01   -0.62508E+02   -0.54458E+02    64   0.683E+01
DAV:   3    -0.186089410889E+02   -0.17329E+02   -0.17302E+02    60   0.487E+01
DAV:   4    -0.198347924190E+02   -0.12259E+01   -0.12256E+01    76   0.165E+01
DAV:   5    -0.198404277707E+02   -0.56354E-02   -0.56352E-02    64   0.108E+00    0.156E+01
DAV:   6    -0.832777143685E+01    0.11513E+02   -0.39726E+01    60   0.290E+01    0.675E+00
DAV:   7    -0.193283682809E+02   -0.11001E+02   -0.10066E+01    56   0.126E+01    0.385E+00
DAV:   8    -0.214842359495E+02   -0.21559E+01   -0.51976E-01    56   0.335E+00    0.232E+00
DAV:   9    -0.236045376446E+02   -0.21203E+01   -0.11888E+00    68   0.369E+00    0.444E-01
DAV:  10    -0.237395343483E+02   -0.13500E+00   -0.28020E-02    56   0.768E-01    0.232E-01
DAV:  11    -0.237468794814E+02   -0.73451E-02   -0.86508E-03    56   0.377E-01    0.148E-01
DAV:  12    -0.237332727290E+02    0.13607E-01   -0.56204E-03    48   0.338E-01    0.119E-01
DAV:  13    -0.237408915194E+02   -0.76188E-02   -0.14831E-03    48   0.180E-01    0.748E-02
DAV:  14    -0.237464767721E+02   -0.55853E-02   -0.30257E-04    48   0.813E-02    0.422E-02
DAV:  15    -0.237510014335E+02   -0.45247E-02   -0.55698E-04    48   0.966E-02    0.239E-02
DAV:  16    -0.237544971072E+02   -0.34957E-02   -0.35191E-04    48   0.768E-02    0.151E-02
DAV:  17    -0.237567102552E+02   -0.22131E-02   -0.12335E-04    48   0.458E-02    0.929E-03
DAV:  18    -0.237575101902E+02   -0.79993E-03   -0.64943E-05    52   0.315E-02    0.609E-03
DAV:  19    -0.237579249305E+02   -0.41474E-03   -0.22683E-05    52   0.187E-02    0.330E-03
DAV:  20    -0.237580641698E+02   -0.13924E-03   -0.37141E-06    60   0.822E-03    0.195E-03
DAV:  21    -0.237581255467E+02   -0.61377E-04   -0.13619E-06    52   0.530E-03
FORCES:
     2.0990810     1.8030001     1.9531136
    -1.3673919    -0.2877337    -1.4627595
    -1.5113042    -1.2058586     1.1690134
     0.8679024    -1.0160242    -1.2242245
    -0.0882872     0.7066164    -0.4351430
   1 F= -.23760399E+02 E0= -.23764652E+02  d E =-.237604E+02  mag=     0.0686
POSITIONS: reading from stdin
Inputting positions...
 0.4992862049204945  0.5047861954673534  0.4971906966378600
 0.5649843876857851  0.5557527627442610  0.5691589962724890
 0.4309031894570722  0.4436027294144739  0.5558654259101101
 0.5627150882315304  0.4343700845737258  0.4439555680914908
 0.4386085704472856  0.5651879039977027  0.4281297151560823
     0.4992862     0.5047862     0.4971907
     0.5649844     0.5557528     0.5691590
     0.4309032     0.4436027     0.5558654
     0.5627151     0.4343701     0.4439556
     0.4386086     0.5651879     0.4281297
     0.4962875     0.5022105     0.4944005
     0.5669378     0.5561638     0.5712487
     0.4330622     0.4453254     0.5541954
     0.5614752     0.4358215     0.4457045
     0.4387347     0.5641785     0.4287513
POSITIONS: read from stdin
 bond charge predicted
       N       E                     dE             d eps       ncg     rms          rms(c)
DAV:   1    -0.239531425581E+02   -0.19508E+00   -0.80105E+00    56   0.179E+01    0.936E-01
DAV:   2    -0.239481017455E+02    0.50408E-02   -0.14518E-01    60   0.219E+00    0.613E-01
DAV:   3    -0.239335801914E+02    0.14522E-01   -0.33055E-02    48   0.939E-01    0.420E-01
DAV:   4    -0.239439601095E+02   -0.10380E-01   -0.23006E-02    52   0.495E-01    0.344E-01
DAV:   5    -0.239455541504E+02   -0.15940E-02   -0.29989E-03    48   0.316E-01    0.277E-01
DAV:   6    -0.239408143510E+02    0.47398E-02   -0.80361E-03    48   0.290E-01    0.194E-01
DAV:   7    -0.239452962386E+02   -0.44819E-02   -0.19641E-03    48   0.185E-01    0.123E-01
DAV:   8    -0.239470291552E+02   -0.17329E-02   -0.79464E-04    48   0.134E-01    0.487E-02
DAV:   9    -0.239476489441E+02   -0.61979E-03   -0.11983E-04    48   0.587E-02    0.184E-02
DAV:  10    -0.239476610802E+02   -0.12136E-04   -0.13353E-05    56   0.200E-02
FORCES:
     0.1989823    -0.7673251     0.0914969
    -0.3040043     0.4268450    -0.1235385
    -0.3893984     0.0533186     0.3870792
     0.1293791     0.1144904    -0.5145954
     0.3650414     0.1726711     0.1595578
   2 F= -.23950033E+02 E0= -.23958973E+02  d E =-.239500E+02  mag=     0.1243
POSITIONS: reading from stdin
Inputting positions...
 0.4998095591218258  0.5038201863518407  0.4975363686939739
 0.5643803677969259  0.5563661649346804  0.5688171111655411
 0.4301550350460747  0.4435549823616095  0.5565736938617227
 0.5630025486552660  0.4344348081959469  0.4430490446374254
 0.4391499301220753  0.5655235342471330  0.4283241852180902
     0.4998096     0.5038202     0.4975364
     0.5643804     0.5563662     0.5688171
     0.4301550     0.4435550     0.5565737
     0.5630025     0.4344348     0.4430490
     0.4391499     0.5655235     0.4283242
     0.4962875     0.5022105     0.4944005
     0.5669378     0.5561638     0.5712487
     0.4330622     0.4453254     0.5541954
     0.5614752     0.4358215     0.4457045
     0.4387347     0.5641785     0.4287513
POSITIONS: read from stdin
 bond charge predicted
       N       E                     dE             d eps       ncg     rms          rms(c)
DAV:   1    -0.239710218789E+02   -0.23373E-01   -0.33636E-01    64   0.405E+00    0.218E-01
DAV:   2    -0.239686444454E+02    0.23774E-02   -0.45930E-03    48   0.429E-01    0.170E-01
DAV:   3    -0.239699721015E+02   -0.13277E-02   -0.10522E-03    48   0.149E-01    0.899E-02
DAV:   4    -0.239708321390E+02   -0.86004E-03   -0.18333E-03    48   0.125E-01    0.681E-02
DAV:   5    -0.239712312764E+02   -0.39914E-03   -0.16840E-04    52   0.695E-02    0.274E-02
DAV:   6    -0.239713037931E+02   -0.72517E-04   -0.34462E-05    64   0.338E-02
FORCES:
    -0.0951312    -0.5039784    -0.0560575
    -0.2150400     0.4025382    -0.0576672
    -0.2477848     0.0628113     0.2539247
     0.1229326     0.0214038    -0.4357058
     0.4350234     0.0172252     0.2955058
   3 F= -.23973698E+02 E0= -.23982272E+02  d E =-.239737E+02  mag=     0.1067

However, once I submitted it to SLURM, the relax.log got stuck in

      Step     Time          Energy         fmax
BFGS:    0 10:36:06      -23.486114        5.9644

The vasp.out in the SLURM is

Writing VASP input files
Starting VASP for initial step...
 running on    2 total cores
 distrk:  each k-point on    2 cores,    1 groups
 distr:  one band on    1 cores,    2 groups
 using from now: INCAR     
 vasp.5.4.4.18Apr17-6-g9f103f2a35 (build Aug 17 2021 15:41:03) complex          

 POSCAR found type information on POSCAR  C  H 
 POSCAR found :  2 types and       5 ions

 ----------------------------------------------------------------------------- 
|                                                                             |
|  ADVICE TO THIS USER RUNNING 'VASP/VAMP'   (HEAR YOUR MASTER'S VOICE ...):  |
|                                                                             |
|      You have a (more or less) 'small supercell' and for smaller cells      |
|      it is recommended  to use the reciprocal-space projection scheme!      |
|      The real space optimization is not  efficient for small cells and it   |
|      is also less accurate ...                                              |
|      Therefore set LREAL=.FALSE. in the  INCAR file                         |
|                                                                             |
 ----------------------------------------------------------------------------- 

 LDA part: xc-table for Pade appr. of Perdew
 POSCAR found type information on POSCAR  C  H 
 POSCAR found :  2 types and       5 ions
 POSCAR, INCAR and KPOINTS ok, starting setup
 FFT: planning ...
 WAVECAR not read
 entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
DAV:   1     0.605365183932E+02    0.60537E+02   -0.13984E+03    48   0.258E+02
DAV:   2    -0.132778358635E+01   -0.61864E+02   -0.54609E+02    68   0.685E+01
DAV:   3    -0.182702214045E+02   -0.16942E+02   -0.16903E+02    64   0.483E+01
DAV:   4    -0.194727136949E+02   -0.12025E+01   -0.12022E+01    76   0.163E+01
DAV:   5    -0.194790929693E+02   -0.63793E-02   -0.63792E-02    68   0.109E+00    0.155E+01
DAV:   6    -0.795603665234E+01    0.11523E+02   -0.40430E+01    60   0.290E+01    0.674E+00
DAV:   7    -0.188994374170E+02   -0.10943E+02   -0.95539E+00    56   0.125E+01    0.386E+00
DAV:   8    -0.211480160986E+02   -0.22486E+01   -0.60936E-01    56   0.352E+00    0.228E+00
DAV:   9    -0.233160096980E+02   -0.21680E+01   -0.12951E+00    72   0.383E+00    0.411E-01
DAV:  10    -0.234512863300E+02   -0.13528E+00   -0.27570E-02    56   0.767E-01    0.242E-01
DAV:  11    -0.234690298082E+02   -0.17743E-01   -0.12642E-02    48   0.447E-01    0.144E-01
DAV:  12    -0.234564710054E+02    0.12559E-01   -0.68264E-03    48   0.362E-01    0.148E-01
DAV:  13    -0.234645124697E+02   -0.80415E-02   -0.17669E-03    48   0.199E-01    0.982E-02
DAV:  14    -0.234674713394E+02   -0.29589E-02   -0.41399E-04    48   0.911E-02    0.441E-02
DAV:  15    -0.234728220099E+02   -0.53507E-02   -0.54924E-04    48   0.932E-02    0.434E-02
DAV:  16    -0.234750961901E+02   -0.22742E-02   -0.35513E-04    56   0.750E-02    0.218E-02
DAV:  17    -0.234776242850E+02   -0.25281E-02   -0.16952E-04    48   0.548E-02    0.110E-02
DAV:  18    -0.234787790902E+02   -0.11548E-02   -0.11496E-04    52   0.404E-02    0.113E-02
DAV:  19    -0.234792019865E+02   -0.42290E-03   -0.24144E-05    52   0.194E-02    0.551E-03
DAV:  20    -0.234794472315E+02   -0.24524E-03   -0.12105E-05    48   0.132E-02    0.197E-03
DAV:  21    -0.234795218658E+02   -0.74634E-04   -0.21851E-06    72   0.619E-03
FORCES:
     3.4783901     4.6858683     1.2318840
    -1.5500863    -1.8781945    -1.5542040
    -2.0861609    -1.8471173     1.8840486
     0.5996300    -0.8922004    -0.9071410
    -0.4417729    -0.0683562    -0.6545875
   1 F= -.23481889E+02 E0= -.23486114E+02  d E =-.234819E+02  mag=     0.0670
POSITIONS: reading from stdin

This means the code did not reach https://github.com/ulissigroup/vasp-interactive/blob/main/vasp_interactive/vasp_interactive.py#L353 which prints Inputting positions.... The annoying part is it did not exit so I cannot get the error message. It would be appreciated if you could give me some suggestions. Thanks!

alchem0x2A commented 1 year ago

you're right, the vasp process in the slurm environment indeed does not detect the "POSITIONS: reading from stdin" for some reason. I noticed that in your slurm job, VASP only takes up 2 cores which is a bit weird. Could you check the slurm job you used and whether there is any stdout redirection used? (see comments from README, but could be more complex than that)

Note:

For VaspInteractive to work properly, the VASP executable (i.e. environment variable $ASE_VASP_COMMAND or $VASP_COMMAND) must not filter or redirect the stdout from vasp_std. If you want to set the file name for capturing the stdout, add txt= to the initial parameters of VaspInteractive.

Seems to me you should be able to solve the issue using the exact setting that you have in the login node, and it probably is not a bug in VaspInteractive. I suggest trying something like an interactive slurm session to figure out.

lixinyuu commented 1 year ago

Thanks, Tian. Yes, you are right; this is not a bug of VaspInteractive. I tested pure VASP in an interactive slurm session, and found it cannot run correctly (This is confusing for me as well. It works in the terminal but not an interactive slurm session). Anyway, I will try to move to VASP6, which is possibly the easiest way to solve this. I'll close this issue. Thanks.

lixinyuu commented 1 year ago

Hi Tian, a follow-up for this. I have moved to VASP6, and VaspInteractive works like a charm now. Thanks for the great work.

ulissigroup / vasp-interactive

Optimization got stuck in reading stdin #27