🚀 RELEASE: v21.05.1 - Githubissues

giovannipizzi commented 3 years ago

Thanks Chris! I'm going to report here things that I've tested, and I suggest that also other tests are reported here so we collect all this information. I might keep updating this comment.

Works fine:

Screen resizing
Run the common workflow for Silicon with abinit, both with fast and precise protocol
- IMPORTANT: one needs to run first: aiida-pseudo install pseudo-dojo -f jthxml -r SR -v 1.0
Run the common workflow for Silicon with cp2k, with fast protocol (interestingly, it works fine even if the binary is serial but it's run with mpirun -np 2?)
Run the common workflow for Silicon with bigdft, with fast protocol
- it is quite slow and the total volume is much larger, but I think this is expected as BigDFT runs in a supercell IIRC
Run the common workflow for Silicon with NWChem, with fast and moderate protocol
- Minimum is a bit off for "fast", but run is very fast; indeed position if almost perfectly fixed already by "moderate"
Run the common workflow (relax only) for Silicon with siesta, with fast protocol
- IMPORTANT: to be run first: verdi data psml uploadfamily /usr/local/share/siesta/psml-files-qm/nc-sr-04_pbe_standard/ nc-sr-04_pbe_standard_psml "pseudos from PseudoDojo"
- Command time aiida-common-workflows launch relax -S Si -p fast siesta works and takes 14.5s on my machine
- Command time aiida-common-workflows launch relax -S Si -p precise siesta works and takes 4m55s on my machine (runs in serial, without MPI in the submission script)
Run the common workflow (relax only) for Silicon with fleur, with fast protocol, forcing to run in serial
- Command time aiida-common-workflows launch eos -S Si -p fast fleur -n 1 1 works (runs in ~5m11s) even if the resulting points are all over the place, not on a parabola (but that's OK probably, see next point)
- Command time aiida-common-workflows launch eos -S Si -p moderate fleur -n 1 1 works (runs in ~45m10s) and the points are on a nice parabola.

Works so-so:

Command time aiida-common-workflows launch relax -S Si -p fast -n 2 -- siesta works but very slowly (3m36s, so in parallel it's 15x slower than in serial...). (Note that the code detects it's running in serial, it prints "Running on 2 nodes in parallel")

Does not work:

common workflows for Silicon with cp2k, with precise protocol. reason: job is launched with a 1h wall time, but it takes longer. @yakutovicha do you confirm that the CP2K workchain are not able to restart when interrupted? I get this error: error.txt. I think for this version of the QMobile, we don't need to fix this (just decide what to mention in the release notes and/or in the paper)
Command time aiida-common-workflows launch relax -S Si -p precise -n 2 -- siesta submits correctly with mpirun -np 2, but siesta seems to be extremely slow. It got stuck for minutes at the step "Setting up quadratic distribution...", now it seems (almost) stuck at the next line "stepf: Fermi-Dirac step function". I killed it after ~9 minutes.
Running aiida-common-workflows launch relax -S Si -p fast fleur fails, error: error-fleur.txt [seems to be related to parallelism and how to distribute data]. (Running in serial works, see above)

chrisjsewell commented 3 years ago

thanks @giovannipizzi. one minor think to note (that I realised when testing), is that the SSSP pseudopotentials are still installed in the "old" way, i.e. not with aiida-pseudos. It wouldn't be too difficult to add this in the next build, and potentially other resources required by common workflows (e.g. pseudos required by other plugins)

giovannipizzi commented 3 years ago

@chrisjsewell indeed, you are right - for Abinit, I had to install the pseudos manually (and the exception message wasn't fully obvious, as a specific version was required). I'll update my comment above with the command I had to run, great if this can be installed. I think similar commands would be useful also for QE and siesta (at least) - @sphuber and @bosonie can test and confirm the exact command(s) needed to be run in the QM

chrisjsewell commented 3 years ago

@sphuber and @bosonie can test and confirm the exact command(s) needed to be run in the QM

yes indeed, is there somewhere a "list" of resources that would ideally be pre-loaded?

giovannipizzi commented 3 years ago

For the common workflows with abinit, before installing the pseudos, I get this error:

ValueError: Error occurred validating port 'inputs': `AbinitCommonRelaxInputGenerator.get_builder()` fails for the provided `generator_inputs`: required pseudo family `PseudoDojo/1.0/PBE/SR/standard/jthxml` is not installed. Please use `aiida-pseudo install pseudo-dojo` to install it.

The message however does not give the full command to run. That command will install Success: installedPseudoDojo/0.4/PBE/SR/standard/psp8containing 72 pseudo potentials, but a different family is needed.

The actual command to run that makes the common workflow to run correctly is: aiida-pseudo install pseudo-dojo -f jthxml -r SR -v 1.0

@sponce24 can you confirm that this command (aiida-pseudo install pseudo-dojo -f jthxml -r SR -v 1.0) is the only one needed to be run, independent of which material and protocol is chosen? Or are there more pseudos to install?

sphuber commented 3 years ago

For aiida-quantumespresso one should run:

aiida-pseudo install sssp -p efficiency
aiida-pseudo install sssp -p precision

That's all

sphuber commented 3 years ago

I have tested the OVA and ran the Si relax with QE and that works. I cannot run anything else because it will freeze my entire machine up. This may be due to my machine as well, but I tried three times, and each time it completely freezes my machine after a few minutes and have to hard reset. Last time I was doing nothing else but running the VM. Not sure what is going on, but don't have any info as everything just freezes.

OS: Ubuntu 20.04 VirtualbBox: 6.1.18 System: 12 CPU 16 GB RAM

vboxsettings

giovannipizzi commented 3 years ago

@sphuber thanks for the report. That's actually annoying. Could you please report the OS version, VirtualBox version, and the hardware of your machine (in particular RAM and # CPUs). And also, in the Virtual Machine settings (in VirtualBox, under settings->System): Base memory, number of processors (and execution cap, and which flags are enabled below), and under the "Display" settings, the video memory, graphical controller and 3D acceleration, ...

(Also: does your machine freeze only when you run DFT calculations inside Quantum Mobile, or also if you just leave it up and running for ~5-10 minutes, without running any CPU-intensive executable inside?)

sponce24 commented 3 years ago

For the common workflows with abinit, before installing the pseudos, I get this error:
ValueError: Error occurred validating port 'inputs': `AbinitCommonRelaxInputGenerator.get_builder()` fails for the provided `generator_inputs`: required pseudo family `PseudoDojo/1.0/PBE/SR/standard/jthxml` is not installed. Please use `aiida-pseudo install pseudo-dojo` to install it.
The message however does not give the full command to run. That command will install Success: installedPseudoDojo/0.4/PBE/SR/standard/psp8containing 72 pseudo potentials, but a different family is needed.

The actual command to run that makes the common workflow to run correctly is: aiida-pseudo install pseudo-dojo -f jthxml -r SR -v 1.0

@sponce24 can you confirm that this command (aiida-pseudo install pseudo-dojo -f jthxml -r SR -v 1.0) is the only one needed to be run, independent of which material and protocol is chosen? Or are there more pseudos to install?

Hello Giovanni,

Yes this is correct. We changed from NC to PAW to be closer to QE which is using the SSSP which contains mostly PAW and USPP. I guess the error message could indeed be updated.

Best, Samuel

giovannipizzi commented 3 years ago

OK thanks! @sponce24 indeed it would be great to update the message in the code, but this is not urgent, I don't think we'll update this version in Quantum Mobile. Instead, @chrisjsewell if you can install those pseudos (now also reported in my first comment above), that would be optimal.

giovannipizzi commented 3 years ago

yes indeed, is there somewhere a "list" of resources that would ideally be pre-loaded?

@chrisjsewell in principle it's all written in the supplementary information of the paper we're writing. I suggest we actually check in this specific QM version, as we're doing above, so we are 100% sure of the exact command; @bosonie could you then please help double checking that the information we report here is consistent with what's written in the supplementary?

chrisjsewell commented 3 years ago

One ease-of-use thing for the common-workflow CLI, it would be nice to be able to pass it a file, e.g. something like

runs.toml

[qe-relax]
engine = "quantum_espresso"
workflow = "relax"
structure = "Si"
protocol = "fast"
code = "qe-pw-6.5@localhost"

[other]
...

$ aiida-common-workflows launch file runs.toml

chrisjsewell commented 3 years ago

I cannot run anything else because it will freeze my entire machine up.

@sphuber can you give me an example of an exact CLI input that freezes, so I can also try

sphuber commented 3 years ago

there is not a CLI command that freezes. I simply submit a workchain to the daemon and let it run. I just maybe execute squeue and verdi process list and at some point my entire machine just hangs. The CLI command was aiida-common-workflows launch relax -S Si -p precise -d quantum_espresso but that really shouldn't matter

chrisjsewell commented 3 years ago

So far, no issue of freezing for me:

giovannipizzi commented 3 years ago

It doesn't freeze for me either - but if it happens to Sebastiaan, it will most probably happen also to someone else :-( it would be good to find at least another person for whom it freezes, as the machine of Sebastiaan seems quite powerful so it shouldn't be an issue of Quantum Mobile). Maybe it's Ubuntu 20.04? I'm running on a Mac. @bosonie also mentioned (I think) that he had issues: Emanuele, could you report if this version freezes your machine as well, or if you are able to run fine?

A few more things for @sphuber to test, if he can/has time: when you start the VM, could you (in your host machine, and maybe also in the guest VM) run some command on the command line to monitor CPU and memory usage (even just top with proper sorting) and check if you see some process just getting the whole memory? (In my experience, these hangs are due to all RAM being taken by some process).

sphuber commented 3 years ago

It needs to be said, that I had my computer freeze some time ago that was unrelated to the QM. It may just be problems with my filesystem and the QM just exacerbates it. I wouldn't necessarily search too long for this on my account.

chrisjsewell commented 3 years ago

yeh indeed it would be good to have someone else also check using Linux. It was built on my Mac, but obviously the whole point of these is that they are cross-platform compatible, so if there is suddenly some issue with that then geez I'm out lol

bosonie commented 3 years ago

Hi there, I'm sorry to bring bad news. I'm also having problems to run with this new QM version. First off...I have Ubuntu 18.04. Second the virtual box is 6.1.16 r140961. What I'm experiencing is that I can not run in parallel with siesta anymore. The machine does not freeze but the calculation hangs for ages in the point where it is supposed to distribute the work to processes. To give you an idea, I tried the relaxation of silicon with precise protocol with 2 processor (usually takes 2 minutes) and now it is about 25 minutes I'm waiting to even start the scf steps. I'm sure I was running with 2 processors in the previous version of QM, at least in the 20.04 version.

Also, running in serial mode, allows the calculation to go on and finish, but I see unusual CPU consumption (constantly around 200% from the top command).

I remember that quantum espresso plugin runs by default with mpi. @sphuber, can you try to run in serial and see what happens?

In the meanwhile I'm trying to run more tests.

chrisjsewell commented 3 years ago

I'm sure I was running with 2 processors in the previous version of QM, at least in the 20.04 version.

To clarify, by previous version you mean v20.11.2a? (theres no 20.04 version)

Neither the siesta code nor aiida-siesta have changed at all since then, so it can't be anything directly to do with them. I literally have no idea of what could have changed to break anything since then, there really hasn't been any changes on the side of simulation code execution 🤷 (mainly just changes to the aiida/aiidalab installation)

bosonie commented 3 years ago

Yes, sorry the v20.11.2a. However I deleted it after exporting my calculations since it was occupying lots of space. As you say from the Siesta side nothing changed. And my laptop is the same. Any modifications to slurm or mpi?

I am now trying with other codes and seems fine though... NWChem and QE are ok. QE with the precise protocol took 20 minutes for Si relaxation. Is it expected @sphuber? 2 processors. And no crash (this is good news).

I also tried a very inexpensive calculation with siesta and verified that the calculations run, they are just incredibly slow. I guess what is left to do is to get the old quantum mobile and test again there.

In any case, the issue is a bit less worrying than expected now.

chrisjsewell commented 3 years ago

No there has been no direct modifications to mpi or slurm (unless something is indirectly affecting it).

I will cc @albgar here, since he did a lot of work on the siesta compilation role

chrisjsewell commented 3 years ago

@sphuber and/or @bosonie perhaps it would be interesting to try the same runs with the new Docker image, and see if it still runs slow there (i.e. is the issue specific to virtualbox)

bosonie commented 3 years ago

Also, could someone run aiida-common-workflows launch relax -S Si -p precise -n 2 -d -- siesta on Mac?

yakutovicha commented 3 years ago

Run the common workflow for Silicon with cp2k, with fast protocol (interestingly, it works fine even if the binary is serial but it's run with mpirun -np 2?)

@giovannipizzi, it still prints everything twice, just the parser somehow able to retrieve the information.

@chrisjsewell can you create a computer named "localhost-serial" and put cp2k code on top of it?

giovannipizzi commented 3 years ago

@yakutovicha it would be great if you could provide to Chris the yaml for verdi computer setup and verdi code setup, after you test in the VM prepared by Chris that indeed this fixes the problem, so we are 100% sure there is no misunderstanding (e.g. I guess you have to set the default number of CPUs to 1? And will aiida-common-workflows figure out automatically which of the two CP2K codes to run?)

yakutovicha commented 3 years ago

@yakutovicha it would be great if you could provide to Chris the yaml for verdi computer setup and verdi code setup, after you test in the VM prepared by Chris that indeed this fixes the problem,

sure, I will prepare a yaml file.

And will aiida-common-workflows figure out automatically which of the two CP2K codes to run?

Instead of having two codes, is it possible to have only one cp2k that is cp2k-7.1@localhost-serial? I would prefer to have only this.

chrisjsewell commented 3 years ago

Also, could someone run aiida-common-workflows launch relax -S Si -p precise -n 2 -d -- siesta on Mac?

@bosonie can the pseudo be loaded via aiida-pseudos?

ValueError: protocol `precise` requires `pseudo_family` with name nc-sr-04_pbe_standard_psml but no family with this name is loaded in the database

bosonie commented 3 years ago

No. For the moment is verdi data psml uploadfamily /usr/local/share/siesta/psml-files-qm/nc-s-04_pbe_standard/ nc-sr-04_pbe_standard "pseudos from PseudoDojo"

giovannipizzi commented 3 years ago

I'm reporting my findings in (my comment above](https://github.com/marvel-nccr/quantum-mobile/pull/185#issuecomment-829026550). In the meantime, I can tell that the when run in serial, timings of siesta seem OK, but in parallel siesta indeed seems to be hanging (or better not hanging, but anyway it's super slow, it's been running for minutes and it's still at the top of the file).

The good news is that in serial it works (and that's the default if you don't specify -n).

@bosonie it would be good to know/confirm in which QMobile it was working fine

chrisjsewell commented 3 years ago

Eurgh, it is taking a long time for me (if it is indeed meant to tak 2 minutes). Should it be running two processes here (see below) or is this not calling mpi correctly?

_aiida_submit:

#!/bin/bash
#SBATCH --no-requeue
#SBATCH --job-name="aiida-515"
#SBATCH --get-user-env
#SBATCH --output=_scheduler-stdout.txt
#SBATCH --error=_scheduler-stderr.txt
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --time=01:00:00

ulimit -s unlimited

'mpirun' '-np' '2' '/usr/local/bin/siesta' < 'aiida.fdf' > 'aiida.out'

max@qmobile:~/.aiida_run/22/d6/a71a-c53c-429b-b5e4-9c6b02255d43$ top

top - 14:53:43 up  4:06,  1 user,  load average: 4.11, 3.59, 1.97
Tasks: 217 total,   3 running, 165 sleeping,   0 stopped,   0 zombie
%Cpu(s):  7.7 us,  4.0 sy,  0.0 ni, 88.2 id,  0.2 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  1524696 total,   163100 free,  1156712 used,   204884 buff/cache
KiB Swap:  2052088 total,  1648888 free,   403200 used.   181440 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                         
 8773 max       20   0  909472 198704   7312 R  88.2 13.0   8:46.65 siesta                                          
 8774 max       20   0  909152 198680   7204 R  88.2 13.0   8:48.75 siesta                                          
 2249 max       20   0  354396   3928   2916 S   5.9  0.3   0:01.91 ibus-daemon                                     
 9486 max       20   0   41936   3748   3084 R   5.9  0.2   0:00.01 top                                             
    1 root      20   0  225692   4404   3060 S   0.0  0.3   0:01.68 systemd

max@qmobile:~/.aiida_run/22/d6/a71a-c53c-429b-b5e4-9c6b02255d43$ scontrol show node
NodeName=qmobile Arch=x86_64 CoresPerSocket=2
   CPUAlloc=2 CPUErr=0 CPUTot=2 CPULoad=3.48
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=qmobile NodeHostName=qmobile Version=17.11
   OS=Linux 4.15.0-128-generic #131-Ubuntu SMP Wed Dec 9 06:57:35 UTC 2020 
   RealMemory=1 AllocMem=0 FreeMem=67 Sockets=1 Boards=1
   State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=jobs 
   BootTime=2021-04-29T10:47:16 SlurmdStartTime=2021-04-29T10:47:34
   CfgTRES=cpu=2,mem=1M,billing=2
   AllocTRES=cpu=2
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

max@qmobile:~/.aiida_run/22/d6/a71a-c53c-429b-b5e4-9c6b02255d43$ scontrol show partition
PartitionName=jobs
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=YES QoS=N/A
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=1 MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=2
   Nodes=qmobile
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=2 TotalNodes=1 SelectTypeParameters=NONE
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

max@qmobile:~/.aiida_run/22/d6/a71a-c53c-429b-b5e4-9c6b02255d43$ siesta
Siesta Version  : MaX-1.2.0
Architecture    : Master-template
Compiler version: GNU Fortran (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Compiler flags  : mpif90 -O2
PP flags        : -DF2003  -DCDF  -DNCDF -DNCDF_4 -DSIESTA__FLOOK  -DMPI -DMPI_TIMING
Libraries       :  libncdf.a libfdict.a libfdict.a  -lnetcdff -lnetcdf  -L/usr/local/lib -lfloookall -ldl -lscalapack-openmpi  -llapack -lblas
Directory       : /home/max/.aiida_run/22/d6/a71a-c53c-429b-b5e4-9c6b02255d43
PARALLEL version
NetCDF support
NetCDF-4 support
Lua support

* Running in serial mode with MPI
>> Start of run:  29-APR-2021  14:58:01

                           ***********************       
                           *  WELCOME TO SIESTA  *       
                           ***********************

giovannipizzi commented 3 years ago

I notice similar things as @chrisjsewell. Note also my updated timings above. It's normal that there are 2 siesta in top if it is executed with mpirun -np 2, HOWEVER it's strange that you get "* Running in serial mode with MPI", I get instead a message saying it's properly running with 2 CPUs in parallel (but still very slow).

bosonie commented 3 years ago

I notice similar things as @chrisjsewell. Note also my updated timings above. It's normal that there are 2 siesta in top if it is executed with mpirun -np 2, HOWEVER it's strange that you get "* Running in serial mode with MPI", I get instead a message saying it's properly running with 2 CPUs in parallel (but still very slow).

Yes this is what I see as well. @chrisjsewell you should read * Running on 2 nodes in parallel

yakutovicha commented 3 years ago

@chrisjsewell here are the configuration files needed to setup cp2k@localhost-serial:

localhost-serial.yaml localhost-serial-config.yaml cp2k-7.1.yaml

Tested on the latest release - works fine.

chrisjsewell commented 3 years ago

@giovannipizzi so would you conclude there is something a bit "weird" going on with parallelisation in general? Have you/can you try a basic "non-aiida" run of parallelised quantum-espresso, as obviously this is pretty important as to whether this can be released for Nicola or not

giovannipizzi commented 3 years ago

Ok, I did my final update to the first comment above. I also added the tests for fleur, that also seems to be working only in serially.

After discussion with Chris, and in order not to delay the release more (both for the common-workflows, and as it's needed for a MaX meeting tomorrow):

we release the QMobile as is (will be done tonight or at the latest tomorrow morning, unless strong motivated complains arrive in the meantime)
For the common workflows, we add to the SI a note for all those codes that should be run serially (I think at least siesta, fleur, cp2k, and probably abinit?), when run in the Quantum Mobile, while of course they can run with any parallelisation on an external computer (@bosonie can you take care of this in the SI of the paper?). In particular, we don't install (in this release) a second localhost-serial computer (will happen in the next release)
We keep for now the instructions on which pseudos to install in the SI of the paper - I will open another issue and Chris will fix this for the next release, but it's OK for now probably

If someone does not agree with the plan, speak now :-)

bosonie commented 3 years ago

Agree on all the points and I will take care of the SI of the paper. I'm also now importing the old quantum mobile and I'll see if in that release siesta was working well with parallelization. Let you know!

yakutovicha commented 3 years ago

Agree on all the points and I will take care of the SI of the paper.

@bosonie are you going to take care of all codes that can't run parallel? If yes, then for CP2K please write something like: "Since only serial executable of CP2K is available on Quantum Mobile, please always lunch common workflows with "-n 1" option". Feel free to adapt.

yakutovicha commented 3 years ago

@giovannipizzi, @chrisjsewell, I tested the QE app on AiiDAlab - works fine.

bosonie commented 3 years ago

I confirm that with the old version of QM (v20.11.2a), the problem disappears. Using 2 processors, the Si relaxation with precise protocol takes 4 minutes (compared to 6.5 minutes when serial). With fast protocol it takes 2 seconds.

chrisjsewell commented 3 years ago

Ok thanks well I will try building the VM with only siesta installed, to see if any other installs are affecting things, and maybe also try with the previous base VM image, which I think changed since 20.11.2a: https://app.vagrantup.com/bento/boxes/ubuntu-18.04

chrisjsewell commented 3 years ago

VM with only siesta installed: https://drive.google.com/file/d/1KHqVnhY5Ms9JpVBzynB4IK98JvBrepV2/view?usp=sharing

I think this may have "fixed" it:

$ verdi data psml uploadfamily /usr/local/share/siesta/psml-files-qm/nc-sr-04_pbe_standard/ nc-sr-04_pbe_standard_psml "pseudos from PseudoDojo"
$ aiida-common-workflows launch relax -S Si -p precise -n 2 -d -- siesta
...
$ verdi process list -a
  PK  Created    Process label               Process State    Process status
----  ---------  --------------------------  ---------------  ----------------
 248  5m ago     SiestaCommonRelaxWorkChain  ⏹ Finished [0]
 249  5m ago     SiestaBaseWorkChain         ⏹ Finished [0]
 252  5m ago     SiestaCalculation           ⏹ Finished [0]
 258  45s ago    get_energy                  ⏹ Finished [0]
 260  44s ago    get_forces_and_stress       ⏹ Finished [0]

@bosonie et al, can you confirm?

additional debug info:

$ siesta
--------------------------------------------------------------------------
[[14888,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: qmobile

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
Siesta Version  : MaX-1.2.0
Architecture    : Master-template
Compiler version: GNU Fortran (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Compiler flags  : mpif90 -O2
PP flags        : -DF2003  -DCDF  -DNCDF -DNCDF_4 -DSIESTA__FLOOK  -DMPI -DMPI_TIMING
Libraries       :  libncdf.a libfdict.a libfdict.a  -lnetcdff -lnetcdf  -L/usr/local/lib -lfloookall -ldl -lscalapack-openmpi  -llapack -lblas
Directory       : /home/max
PARALLEL version
NetCDF support
NetCDF-4 support
Lua support

* Running in serial mode with MPI
>> Start of run:  30-APR-2021  10:13:47

                           ***********************       
                           *  WELCOME TO SIESTA  *       
                           ***********************

(aiida) max@qmobile:~$ ompi_info
                 Package: Open MPI buildd@lcy01-amd64-009 Distribution
                Open MPI: 2.1.1
  Open MPI repo revision: v2.1.0-100-ga2fdb5b
   Open MPI release date: May 10, 2017
                Open RTE: 2.1.1
  Open RTE repo revision: v2.1.0-100-ga2fdb5b
   Open RTE release date: May 10, 2017
                    OPAL: 2.1.1
      OPAL repo revision: v2.1.0-100-ga2fdb5b
       OPAL release date: May 10, 2017
                 MPI API: 3.1.0
            Ident string: 2.1.1
                  Prefix: /usr
 Configured architecture: x86_64-pc-linux-gnu
          Configure host: lcy01-amd64-009
           Configured by: buildd
           Configured on: Mon Feb  5 19:59:59 UTC 2018
          Configure host: lcy01-amd64-009
                Built by: buildd
                Built on: Mon Feb  5 20:05:56 UTC 2018
              Built host: lcy01-amd64-009
              C bindings: yes
            C++ bindings: yes
             Fort mpif.h: yes (all)
            Fort use mpi: yes (full: ignore TKR)
       Fort use mpi size: deprecated-ompi-info-value
        Fort use mpi_f08: yes
 Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
                          limitations in the gfortran compiler, does not
                          support the following: array subsections, direct
                          passthru (where possible) to underlying Open MPI's
                          C functionality
  Fort mpi_f08 subarrays: no
           Java bindings: yes
  Wrapper compiler rpath: disabled
              C compiler: gcc
     C compiler absolute: /usr/bin/gcc
  C compiler family name: GNU
      C compiler version: 7.3.0
            C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
           Fort compiler: gfortran
       Fort compiler abs: /usr/bin/gfortran
         Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::)
   Fort 08 assumed shape: yes
      Fort optional args: yes
          Fort INTERFACE: yes
    Fort ISO_FORTRAN_ENV: yes
       Fort STORAGE_SIZE: yes
      Fort BIND(C) (all): yes
      Fort ISO_C_BINDING: yes
 Fort SUBROUTINE BIND(C): yes
       Fort TYPE,BIND(C): yes
 Fort T,BIND(C,name="a"): yes
            Fort PRIVATE: yes
          Fort PROTECTED: yes
           Fort ABSTRACT: yes
       Fort ASYNCHRONOUS: yes
          Fort PROCEDURE: yes
         Fort USE...ONLY: yes
           Fort C_FUNLOC: yes
 Fort f08 using wrappers: yes
         Fort MPI_SIZEOF: yes
             C profiling: yes
           C++ profiling: yes
   Fort mpif.h profiling: yes
  Fort use mpi profiling: yes
   Fort use mpi_f08 prof: yes
          C++ exceptions: no
          Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
                          OMPI progress: no, ORTE progress: yes, Event lib:
                          yes)
           Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: yes
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
              dl support: yes
   Heterogeneous support: yes
 mpirun default --prefix: no
         MPI I/O support: yes
       MPI_WTIME support: native
     Symbol vis. support: yes
   Host topology support: yes
          MPI extensions: affinity, cuda
  MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
           MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v2.1.1)
           MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v2.1.1)
           MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA btl: sm (MCA v2.1.0, API v3.0.0, Component v2.1.1)
                 MCA btl: self (MCA v2.1.0, API v3.0.0, Component v2.1.1)
                 MCA btl: openib (MCA v2.1.0, API v3.0.0, Component v2.1.1)
                 MCA btl: tcp (MCA v2.1.0, API v3.0.0, Component v2.1.1)
                 MCA btl: vader (MCA v2.1.0, API v3.0.0, Component v2.1.1)
                  MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v2.1.1)
               MCA event: libevent2022 (MCA v2.1.0, API v2.0.0, Component
                          v2.1.1)
               MCA hwloc: external (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                  MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
                          v2.1.1)
                  MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
                          v2.1.1)
         MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v2.1.1)
         MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v2.1.1)
              MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v2.1.1)
               MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component v2.1.1)
             MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
                          v2.1.1)
                MCA pmix: pmix112 (MCA v2.1.0, API v2.0.0, Component v2.1.1)
               MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v2.1.1)
              MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v2.1.1)
                 MCA sec: basic (MCA v2.1.0, API v1.0.0, Component v2.1.1)
               MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v2.1.1)
               MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v2.1.1)
               MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v2.1.1)
               MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA dfs: orted (MCA v2.1.0, API v1.0.0, Component v2.1.1)
                 MCA dfs: app (MCA v2.1.0, API v1.0.0, Component v2.1.1)
                 MCA dfs: test (MCA v2.1.0, API v1.0.0, Component v2.1.1)
              MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component
                          v2.1.1)
              MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component
                          v2.1.1)
              MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component
                          v2.1.1)
              MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component
                          v2.1.1)
                 MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v2.1.1)
                 MCA ess: env (MCA v2.1.0, API v3.0.0, Component v2.1.1)
                 MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v2.1.1)
                 MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component
                          v2.1.1)
                 MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v2.1.1)
                 MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v2.1.1)
               MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v2.1.1)
             MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component v2.1.1)
                 MCA iof: mr_hnp (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA iof: mr_orted (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v2.1.1)
            MCA notifier: syslog (MCA v2.1.0, API v1.0.0, Component v2.1.1)
                MCA odls: default (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA oob: ud (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA oob: usock (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component
                          v2.1.1)
                 MCA ras: loadleveler (MCA v2.1.0, API v2.0.0, Component
                          v2.1.1)
                 MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component
                          v2.1.1)
               MCA rmaps: staged (MCA v2.1.0, API v2.0.0, Component v2.1.1)
               MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component v2.1.1)
               MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v2.1.1)
               MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component
                          v2.1.1)
               MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component
                          v2.1.1)
               MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component
                          v2.1.1)
               MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA rml: oob (MCA v2.1.0, API v2.0.0, Component v2.1.1)
              MCA routed: radix (MCA v2.1.0, API v2.0.0, Component v2.1.1)
              MCA routed: debruijn (MCA v2.1.0, API v2.0.0, Component v2.1.1)
              MCA routed: direct (MCA v2.1.0, API v2.0.0, Component v2.1.1)
              MCA routed: binomial (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA rtc: freq (MCA v2.1.0, API v1.0.0, Component v2.1.1)
                 MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v2.1.1)
              MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v2.1.1)
               MCA state: novm (MCA v2.1.0, API v1.0.0, Component v2.1.1)
               MCA state: app (MCA v2.1.0, API v1.0.0, Component v2.1.1)
               MCA state: dvm (MCA v2.1.0, API v1.0.0, Component v2.1.1)
               MCA state: tool (MCA v2.1.0, API v1.0.0, Component v2.1.1)
               MCA state: orted (MCA v2.1.0, API v1.0.0, Component v2.1.1)
               MCA state: staged_orted (MCA v2.1.0, API v1.0.0, Component
                          v2.1.1)
               MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v2.1.1)
               MCA state: staged_hnp (MCA v2.1.0, API v1.0.0, Component
                          v2.1.1)
                 MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                MCA coll: self (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v2.1.1)
               MCA fcoll: static (MCA v2.1.0, API v2.0.0, Component v2.1.1)
               MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v2.1.1)
               MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
                          v2.1.1)
               MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
                          v2.1.1)
               MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component
                          v2.1.1)
                  MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                  MCA io: romio314 (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                  MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA mtl: ofi (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA mtl: psm (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v2.1.1)
                 MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v2.1.1)
                 MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v2.1.1)
                 MCA pml: v (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                 MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v2.1.1)
            MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
                          v2.1.1)
            MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
                          v2.1.1)
            MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v2.1.1)
                MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v2.1.1)
           MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
                          v2.1.1)

chrisjsewell commented 3 years ago

HOWEVER it's strange that you get "* Running in serial mode with MPI", I get instead a message saying it's properly running with 2 CPUs in parallel (but still very slow).

@giovannipizzi this is only where I directly called the siesta executable (to get the debug info), not mpirun siesta

chrisjsewell commented 3 years ago

I note in this "siesta only" build, we are getting this at the start of the stdout:

[[14705,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: qmobile

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.

what does this mean, should we be worried, and also should look if this was also the case in 20.11.2a?

bosonie commented 3 years ago

I've never seen this message before. Not in 20.11.2a nor in 21.05.1. My guess is that it does not come from siesta itself, but from OpenMPI. Still downloading the VM with siesta only...

bosonie commented 3 years ago

I'm running with the last VM provided by @chrisjsewell (only siesta included) and I get the same behavior reported by Chris. The same warning of OpenMPI and more or less the same computation time (6.5 minutes for Si relaxation with precise protocol and 2 processors). So performances are a bit worse than version 20.11.2a but surely we do not see the huge problem of v21.05.1. Anyway I think that once more we have confirmation that there is something a bit "weird" going on with parallelisation in general. How is it possible that now a warning appears out of nowhere without any apparent change in the code/OpenMPI installation?

chrisjsewell commented 3 years ago

ok I basically have to merge this now, because I'm getting emails from Nicola lol. But we can still continue the conversation and debugging here

chrisjsewell commented 3 years ago

So I have now created a VM with siesta+fleur: https://drive.google.com/file/d/1eZ_2vmu5nY6dGDJ9sdIu3C6SmQrZgzyF/view?usp=sharing

Now it appears to be back to the long run time for siesta (and still has the openmpi warning)

How is it possible that now a warning appears out of nowhere without any apparent change in the code/OpenMPI installation?

No idea 🤷 but indeed it would be great to get to the bottom of this

marvel-nccr / quantum-mobile

🚀 RELEASE: v21.05.1 #185