Closed chrisjsewell closed 3 years ago
thanks @giovannipizzi. one minor think to note (that I realised when testing), is that the SSSP pseudopotentials are still installed in the "old" way, i.e. not with aiida-pseudos. It wouldn't be too difficult to add this in the next build, and potentially other resources required by common workflows (e.g. pseudos required by other plugins)
@chrisjsewell indeed, you are right - for Abinit, I had to install the pseudos manually (and the exception message wasn't fully obvious, as a specific version was required). I'll update my comment above with the command I had to run, great if this can be installed. I think similar commands would be useful also for QE and siesta (at least) - @sphuber and @bosonie can test and confirm the exact command(s) needed to be run in the QM
@sphuber and @bosonie can test and confirm the exact command(s) needed to be run in the QM
yes indeed, is there somewhere a "list" of resources that would ideally be pre-loaded?
For the common workflows with abinit, before installing the pseudos, I get this error:
ValueError: Error occurred validating port 'inputs': `AbinitCommonRelaxInputGenerator.get_builder()` fails for the provided `generator_inputs`: required pseudo family `PseudoDojo/1.0/PBE/SR/standard/jthxml` is not installed. Please use `aiida-pseudo install pseudo-dojo` to install it.
The message however does not give the full command to run.
That command will install Success: installed
PseudoDojo/0.4/PBE/SR/standard/psp8containing 72 pseudo potentials
, but a different family is needed.
The actual command to run that makes the common workflow to run correctly is:
aiida-pseudo install pseudo-dojo -f jthxml -r SR -v 1.0
@sponce24 can you confirm that this command (aiida-pseudo install pseudo-dojo -f jthxml -r SR -v 1.0
) is the only one needed to be run, independent of which material and protocol is chosen? Or are there more pseudos to install?
For aiida-quantumespresso
one should run:
aiida-pseudo install sssp -p efficiency
aiida-pseudo install sssp -p precision
That's all
I have tested the OVA and ran the Si relax with QE and that works. I cannot run anything else because it will freeze my entire machine up. This may be due to my machine as well, but I tried three times, and each time it completely freezes my machine after a few minutes and have to hard reset. Last time I was doing nothing else but running the VM. Not sure what is going on, but don't have any info as everything just freezes.
OS: Ubuntu 20.04 VirtualbBox: 6.1.18 System: 12 CPU 16 GB RAM
@sphuber thanks for the report. That's actually annoying. Could you please report the OS version, VirtualBox version, and the hardware of your machine (in particular RAM and # CPUs). And also, in the Virtual Machine settings (in VirtualBox, under settings->System): Base memory, number of processors (and execution cap, and which flags are enabled below), and under the "Display" settings, the video memory, graphical controller and 3D acceleration, ...
(Also: does your machine freeze only when you run DFT calculations inside Quantum Mobile, or also if you just leave it up and running for ~5-10 minutes, without running any CPU-intensive executable inside?)
For the common workflows with abinit, before installing the pseudos, I get this error:
ValueError: Error occurred validating port 'inputs': `AbinitCommonRelaxInputGenerator.get_builder()` fails for the provided `generator_inputs`: required pseudo family `PseudoDojo/1.0/PBE/SR/standard/jthxml` is not installed. Please use `aiida-pseudo install pseudo-dojo` to install it.
The message however does not give the full command to run. That command will install
Success: installed
PseudoDojo/0.4/PBE/SR/standard/psp8containing 72 pseudo potentials
, but a different family is needed.The actual command to run that makes the common workflow to run correctly is:
aiida-pseudo install pseudo-dojo -f jthxml -r SR -v 1.0
@sponce24 can you confirm that this command (
aiida-pseudo install pseudo-dojo -f jthxml -r SR -v 1.0
) is the only one needed to be run, independent of which material and protocol is chosen? Or are there more pseudos to install?
Hello Giovanni,
Yes this is correct. We changed from NC to PAW to be closer to QE which is using the SSSP which contains mostly PAW and USPP. I guess the error message could indeed be updated.
Best, Samuel
OK thanks! @sponce24 indeed it would be great to update the message in the code, but this is not urgent, I don't think we'll update this version in Quantum Mobile. Instead, @chrisjsewell if you can install those pseudos (now also reported in my first comment above), that would be optimal.
yes indeed, is there somewhere a "list" of resources that would ideally be pre-loaded?
@chrisjsewell in principle it's all written in the supplementary information of the paper we're writing. I suggest we actually check in this specific QM version, as we're doing above, so we are 100% sure of the exact command; @bosonie could you then please help double checking that the information we report here is consistent with what's written in the supplementary?
One ease-of-use thing for the common-workflow CLI, it would be nice to be able to pass it a file, e.g. something like
runs.toml
[qe-relax]
engine = "quantum_espresso"
workflow = "relax"
structure = "Si"
protocol = "fast"
code = "qe-pw-6.5@localhost"
[other]
...
$ aiida-common-workflows launch file runs.toml
I cannot run anything else because it will freeze my entire machine up.
@sphuber can you give me an example of an exact CLI input that freezes, so I can also try
there is not a CLI command that freezes. I simply submit a workchain to the daemon and let it run. I just maybe execute squeue
and verdi process list
and at some point my entire machine just hangs. The CLI command was aiida-common-workflows launch relax -S Si -p precise -d quantum_espresso
but that really shouldn't matter
So far, no issue of freezing for me:
It doesn't freeze for me either - but if it happens to Sebastiaan, it will most probably happen also to someone else :-( it would be good to find at least another person for whom it freezes, as the machine of Sebastiaan seems quite powerful so it shouldn't be an issue of Quantum Mobile). Maybe it's Ubuntu 20.04? I'm running on a Mac. @bosonie also mentioned (I think) that he had issues: Emanuele, could you report if this version freezes your machine as well, or if you are able to run fine?
A few more things for @sphuber to test, if he can/has time: when you start the VM, could you (in your host machine, and maybe also in the guest VM) run some command on the command line to monitor CPU and memory usage (even just top
with proper sorting) and check if you see some process just getting the whole memory? (In my experience, these hangs are due to all RAM being taken by some process).
It needs to be said, that I had my computer freeze some time ago that was unrelated to the QM. It may just be problems with my filesystem and the QM just exacerbates it. I wouldn't necessarily search too long for this on my account.
yeh indeed it would be good to have someone else also check using Linux. It was built on my Mac, but obviously the whole point of these is that they are cross-platform compatible, so if there is suddenly some issue with that then geez I'm out lol
Hi there, I'm sorry to bring bad news. I'm also having problems to run with this new QM version. First off...I have Ubuntu 18.04. Second the virtual box is 6.1.16 r140961. What I'm experiencing is that I can not run in parallel with siesta anymore. The machine does not freeze but the calculation hangs for ages in the point where it is supposed to distribute the work to processes. To give you an idea, I tried the relaxation of silicon with precise protocol with 2 processor (usually takes 2 minutes) and now it is about 25 minutes I'm waiting to even start the scf steps. I'm sure I was running with 2 processors in the previous version of QM, at least in the 20.04 version.
Also, running in serial mode, allows the calculation to go on and finish, but I see unusual CPU consumption (constantly around 200% from the top command).
I remember that quantum espresso plugin runs by default with mpi. @sphuber, can you try to run in serial and see what happens?
In the meanwhile I'm trying to run more tests.
I'm sure I was running with 2 processors in the previous version of QM, at least in the 20.04 version.
To clarify, by previous version you mean v20.11.2a? (theres no 20.04 version)
Neither the siesta code nor aiida-siesta have changed at all since then, so it can't be anything directly to do with them. I literally have no idea of what could have changed to break anything since then, there really hasn't been any changes on the side of simulation code execution 🤷 (mainly just changes to the aiida/aiidalab installation)
Yes, sorry the v20.11.2a. However I deleted it after exporting my calculations since it was occupying lots of space.
As you say from the Siesta side nothing changed. And my laptop is the same. Any modifications to slurm
or mpi
?
I am now trying with other codes and seems fine though... NWChem and QE are ok. QE with the precise protocol took 20 minutes for Si relaxation. Is it expected @sphuber? 2 processors. And no crash (this is good news).
I also tried a very inexpensive calculation with siesta and verified that the calculations run, they are just incredibly slow. I guess what is left to do is to get the old quantum mobile and test again there.
In any case, the issue is a bit less worrying than expected now.
No there has been no direct modifications to mpi or slurm (unless something is indirectly affecting it).
I will cc @albgar here, since he did a lot of work on the siesta compilation role
@sphuber and/or @bosonie perhaps it would be interesting to try the same runs with the new Docker image, and see if it still runs slow there (i.e. is the issue specific to virtualbox)
Also, could someone run aiida-common-workflows launch relax -S Si -p precise -n 2 -d -- siesta
on Mac?
- Run the common workflow for Silicon with cp2k, with fast protocol (interestingly, it works fine even if the binary is serial but it's run with
mpirun -np 2
?)
@giovannipizzi, it still prints everything twice, just the parser somehow able to retrieve the information.
@chrisjsewell can you create a computer named "localhost-serial" and put cp2k code on top of it?
@yakutovicha it would be great if you could provide to Chris the yaml for verdi computer setup
and verdi code setup
, after you test in the VM prepared by Chris that indeed this fixes the problem, so we are 100% sure there is no misunderstanding (e.g. I guess you have to set the default number of CPUs to 1? And will aiida-common-workflows figure out automatically which of the two CP2K codes to run?)
@yakutovicha it would be great if you could provide to Chris the yaml for
verdi computer setup
andverdi code setup
, after you test in the VM prepared by Chris that indeed this fixes the problem,
sure, I will prepare a yaml file.
And will aiida-common-workflows figure out automatically which of the two CP2K codes to run?
Instead of having two codes, is it possible to have only one cp2k that is cp2k-7.1@localhost-serial
? I would prefer to have only this.
Also, could someone run aiida-common-workflows launch relax -S Si -p precise -n 2 -d -- siesta on Mac?
@bosonie can the pseudo be loaded via aiida-pseudos?
ValueError: protocol `precise` requires `pseudo_family` with name nc-sr-04_pbe_standard_psml but no family with this name is loaded in the database
No. For the moment is verdi data psml uploadfamily /usr/local/share/siesta/psml-files-qm/nc-s-04_pbe_standard/ nc-sr-04_pbe_standard "pseudos from PseudoDojo"
I'm reporting my findings in (my comment above](https://github.com/marvel-nccr/quantum-mobile/pull/185#issuecomment-829026550). In the meantime, I can tell that the when run in serial, timings of siesta seem OK, but in parallel siesta indeed seems to be hanging (or better not hanging, but anyway it's super slow, it's been running for minutes and it's still at the top of the file).
The good news is that in serial it works (and that's the default if you don't specify -n
).
@bosonie it would be good to know/confirm in which QMobile it was working fine
Eurgh, it is taking a long time for me (if it is indeed meant to tak 2 minutes). Should it be running two processes here (see below) or is this not calling mpi correctly?
_aiida_submit:
#!/bin/bash
#SBATCH --no-requeue
#SBATCH --job-name="aiida-515"
#SBATCH --get-user-env
#SBATCH --output=_scheduler-stdout.txt
#SBATCH --error=_scheduler-stderr.txt
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --time=01:00:00
ulimit -s unlimited
'mpirun' '-np' '2' '/usr/local/bin/siesta' < 'aiida.fdf' > 'aiida.out'
max@qmobile:~/.aiida_run/22/d6/a71a-c53c-429b-b5e4-9c6b02255d43$ top
top - 14:53:43 up 4:06, 1 user, load average: 4.11, 3.59, 1.97
Tasks: 217 total, 3 running, 165 sleeping, 0 stopped, 0 zombie
%Cpu(s): 7.7 us, 4.0 sy, 0.0 ni, 88.2 id, 0.2 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 1524696 total, 163100 free, 1156712 used, 204884 buff/cache
KiB Swap: 2052088 total, 1648888 free, 403200 used. 181440 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8773 max 20 0 909472 198704 7312 R 88.2 13.0 8:46.65 siesta
8774 max 20 0 909152 198680 7204 R 88.2 13.0 8:48.75 siesta
2249 max 20 0 354396 3928 2916 S 5.9 0.3 0:01.91 ibus-daemon
9486 max 20 0 41936 3748 3084 R 5.9 0.2 0:00.01 top
1 root 20 0 225692 4404 3060 S 0.0 0.3 0:01.68 systemd
max@qmobile:~/.aiida_run/22/d6/a71a-c53c-429b-b5e4-9c6b02255d43$ scontrol show node
NodeName=qmobile Arch=x86_64 CoresPerSocket=2
CPUAlloc=2 CPUErr=0 CPUTot=2 CPULoad=3.48
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=qmobile NodeHostName=qmobile Version=17.11
OS=Linux 4.15.0-128-generic #131-Ubuntu SMP Wed Dec 9 06:57:35 UTC 2020
RealMemory=1 AllocMem=0 FreeMem=67 Sockets=1 Boards=1
State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=jobs
BootTime=2021-04-29T10:47:16 SlurmdStartTime=2021-04-29T10:47:34
CfgTRES=cpu=2,mem=1M,billing=2
AllocTRES=cpu=2
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
max@qmobile:~/.aiida_run/22/d6/a71a-c53c-429b-b5e4-9c6b02255d43$ scontrol show partition
PartitionName=jobs
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=YES QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=1 MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=2
Nodes=qmobile
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=2 TotalNodes=1 SelectTypeParameters=NONE
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
max@qmobile:~/.aiida_run/22/d6/a71a-c53c-429b-b5e4-9c6b02255d43$ siesta
Siesta Version : MaX-1.2.0
Architecture : Master-template
Compiler version: GNU Fortran (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Compiler flags : mpif90 -O2
PP flags : -DF2003 -DCDF -DNCDF -DNCDF_4 -DSIESTA__FLOOK -DMPI -DMPI_TIMING
Libraries : libncdf.a libfdict.a libfdict.a -lnetcdff -lnetcdf -L/usr/local/lib -lfloookall -ldl -lscalapack-openmpi -llapack -lblas
Directory : /home/max/.aiida_run/22/d6/a71a-c53c-429b-b5e4-9c6b02255d43
PARALLEL version
NetCDF support
NetCDF-4 support
Lua support
* Running in serial mode with MPI
>> Start of run: 29-APR-2021 14:58:01
***********************
* WELCOME TO SIESTA *
***********************
I notice similar things as @chrisjsewell.
Note also my updated timings above. It's normal that there are 2 siesta in top
if it is executed with mpirun -np 2
, HOWEVER it's strange that you get "* Running in serial mode with MPI", I get instead a message saying it's properly running with 2 CPUs in parallel (but still very slow).
I notice similar things as @chrisjsewell. Note also my updated timings above. It's normal that there are 2 siesta in
top
if it is executed withmpirun -np 2
, HOWEVER it's strange that you get "* Running in serial mode with MPI", I get instead a message saying it's properly running with 2 CPUs in parallel (but still very slow).
Yes this is what I see as well. @chrisjsewell you should read * Running on 2 nodes in parallel
@chrisjsewell here are the configuration files needed to setup cp2k@localhost-serial
:
localhost-serial.yaml localhost-serial-config.yaml cp2k-7.1.yaml
Tested on the latest release - works fine.
@giovannipizzi so would you conclude there is something a bit "weird" going on with parallelisation in general? Have you/can you try a basic "non-aiida" run of parallelised quantum-espresso, as obviously this is pretty important as to whether this can be released for Nicola or not
Ok, I did my final update to the first comment above. I also added the tests for fleur, that also seems to be working only in serially.
After discussion with Chris, and in order not to delay the release more (both for the common-workflows, and as it's needed for a MaX meeting tomorrow):
If someone does not agree with the plan, speak now :-)
Agree on all the points and I will take care of the SI of the paper. I'm also now importing the old quantum mobile and I'll see if in that release siesta was working well with parallelization. Let you know!
Agree on all the points and I will take care of the SI of the paper.
@bosonie are you going to take care of all codes that can't run parallel? If yes, then for CP2K please write something like: "Since only serial executable of CP2K is available on Quantum Mobile, please always lunch common workflows with "-n 1" option". Feel free to adapt.
@giovannipizzi, @chrisjsewell, I tested the QE app on AiiDAlab - works fine.
I confirm that with the old version of QM (v20.11.2a), the problem disappears. Using 2 processors, the Si relaxation with precise protocol takes 4 minutes (compared to 6.5 minutes when serial). With fast protocol it takes 2 seconds.
Ok thanks well I will try building the VM with only siesta installed, to see if any other installs are affecting things, and maybe also try with the previous base VM image, which I think changed since 20.11.2a: https://app.vagrantup.com/bento/boxes/ubuntu-18.04
VM with only siesta installed: https://drive.google.com/file/d/1KHqVnhY5Ms9JpVBzynB4IK98JvBrepV2/view?usp=sharing
I think this may have "fixed" it:
$ verdi data psml uploadfamily /usr/local/share/siesta/psml-files-qm/nc-sr-04_pbe_standard/ nc-sr-04_pbe_standard_psml "pseudos from PseudoDojo"
$ aiida-common-workflows launch relax -S Si -p precise -n 2 -d -- siesta
...
$ verdi process list -a
PK Created Process label Process State Process status
---- --------- -------------------------- --------------- ----------------
248 5m ago SiestaCommonRelaxWorkChain ⏹ Finished [0]
249 5m ago SiestaBaseWorkChain ⏹ Finished [0]
252 5m ago SiestaCalculation ⏹ Finished [0]
258 45s ago get_energy ⏹ Finished [0]
260 44s ago get_forces_and_stress ⏹ Finished [0]
@bosonie et al, can you confirm?
additional debug info:
$ siesta
--------------------------------------------------------------------------
[[14888,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: qmobile
Another transport will be used instead, although this may result in
lower performance.
NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
Siesta Version : MaX-1.2.0
Architecture : Master-template
Compiler version: GNU Fortran (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Compiler flags : mpif90 -O2
PP flags : -DF2003 -DCDF -DNCDF -DNCDF_4 -DSIESTA__FLOOK -DMPI -DMPI_TIMING
Libraries : libncdf.a libfdict.a libfdict.a -lnetcdff -lnetcdf -L/usr/local/lib -lfloookall -ldl -lscalapack-openmpi -llapack -lblas
Directory : /home/max
PARALLEL version
NetCDF support
NetCDF-4 support
Lua support
* Running in serial mode with MPI
>> Start of run: 30-APR-2021 10:13:47
***********************
* WELCOME TO SIESTA *
***********************
(aiida) max@qmobile:~$ ompi_info
Package: Open MPI buildd@lcy01-amd64-009 Distribution
Open MPI: 2.1.1
Open MPI repo revision: v2.1.0-100-ga2fdb5b
Open MPI release date: May 10, 2017
Open RTE: 2.1.1
Open RTE repo revision: v2.1.0-100-ga2fdb5b
Open RTE release date: May 10, 2017
OPAL: 2.1.1
OPAL repo revision: v2.1.0-100-ga2fdb5b
OPAL release date: May 10, 2017
MPI API: 3.1.0
Ident string: 2.1.1
Prefix: /usr
Configured architecture: x86_64-pc-linux-gnu
Configure host: lcy01-amd64-009
Configured by: buildd
Configured on: Mon Feb 5 19:59:59 UTC 2018
Configure host: lcy01-amd64-009
Built by: buildd
Built on: Mon Feb 5 20:05:56 UTC 2018
Built host: lcy01-amd64-009
C bindings: yes
C++ bindings: yes
Fort mpif.h: yes (all)
Fort use mpi: yes (full: ignore TKR)
Fort use mpi size: deprecated-ompi-info-value
Fort use mpi_f08: yes
Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
limitations in the gfortran compiler, does not
support the following: array subsections, direct
passthru (where possible) to underlying Open MPI's
C functionality
Fort mpi_f08 subarrays: no
Java bindings: yes
Wrapper compiler rpath: disabled
C compiler: gcc
C compiler absolute: /usr/bin/gcc
C compiler family name: GNU
C compiler version: 7.3.0
C++ compiler: g++
C++ compiler absolute: /usr/bin/g++
Fort compiler: gfortran
Fort compiler abs: /usr/bin/gfortran
Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::)
Fort 08 assumed shape: yes
Fort optional args: yes
Fort INTERFACE: yes
Fort ISO_FORTRAN_ENV: yes
Fort STORAGE_SIZE: yes
Fort BIND(C) (all): yes
Fort ISO_C_BINDING: yes
Fort SUBROUTINE BIND(C): yes
Fort TYPE,BIND(C): yes
Fort T,BIND(C,name="a"): yes
Fort PRIVATE: yes
Fort PROTECTED: yes
Fort ABSTRACT: yes
Fort ASYNCHRONOUS: yes
Fort PROCEDURE: yes
Fort USE...ONLY: yes
Fort C_FUNLOC: yes
Fort f08 using wrappers: yes
Fort MPI_SIZEOF: yes
C profiling: yes
C++ profiling: yes
Fort mpif.h profiling: yes
Fort use mpi profiling: yes
Fort use mpi_f08 prof: yes
C++ exceptions: no
Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
OMPI progress: no, ORTE progress: yes, Event lib:
yes)
Sparse Groups: no
Internal debug support: no
MPI interface warnings: yes
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
dl support: yes
Heterogeneous support: yes
mpirun default --prefix: no
MPI I/O support: yes
MPI_WTIME support: native
Symbol vis. support: yes
Host topology support: yes
MPI extensions: affinity, cuda
MPI_MAX_PROCESSOR_NAME: 256
MPI_MAX_ERROR_STRING: 256
MPI_MAX_OBJECT_NAME: 64
MPI_MAX_INFO_KEY: 36
MPI_MAX_INFO_VAL: 256
MPI_MAX_PORT_NAME: 1024
MPI_MAX_DATAREP_STRING: 128
MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA btl: sm (MCA v2.1.0, API v3.0.0, Component v2.1.1)
MCA btl: self (MCA v2.1.0, API v3.0.0, Component v2.1.1)
MCA btl: openib (MCA v2.1.0, API v3.0.0, Component v2.1.1)
MCA btl: tcp (MCA v2.1.0, API v3.0.0, Component v2.1.1)
MCA btl: vader (MCA v2.1.0, API v3.0.0, Component v2.1.1)
MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v2.1.1)
MCA event: libevent2022 (MCA v2.1.0, API v2.0.0, Component
v2.1.1)
MCA hwloc: external (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
v2.1.1)
MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
v2.1.1)
MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component v2.1.1)
MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
v2.1.1)
MCA pmix: pmix112 (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v2.1.1)
MCA sec: basic (MCA v2.1.0, API v1.0.0, Component v2.1.1)
MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA dfs: orted (MCA v2.1.0, API v1.0.0, Component v2.1.1)
MCA dfs: app (MCA v2.1.0, API v1.0.0, Component v2.1.1)
MCA dfs: test (MCA v2.1.0, API v1.0.0, Component v2.1.1)
MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component
v2.1.1)
MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component
v2.1.1)
MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component
v2.1.1)
MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component
v2.1.1)
MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v2.1.1)
MCA ess: env (MCA v2.1.0, API v3.0.0, Component v2.1.1)
MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v2.1.1)
MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component
v2.1.1)
MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v2.1.1)
MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v2.1.1)
MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component v2.1.1)
MCA iof: mr_hnp (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA iof: mr_orted (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA notifier: syslog (MCA v2.1.0, API v1.0.0, Component v2.1.1)
MCA odls: default (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA oob: ud (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA oob: usock (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component
v2.1.1)
MCA ras: loadleveler (MCA v2.1.0, API v2.0.0, Component
v2.1.1)
MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component
v2.1.1)
MCA rmaps: staged (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component
v2.1.1)
MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component
v2.1.1)
MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component
v2.1.1)
MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA rml: oob (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA routed: radix (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA routed: debruijn (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA routed: direct (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA routed: binomial (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA rtc: freq (MCA v2.1.0, API v1.0.0, Component v2.1.1)
MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v2.1.1)
MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v2.1.1)
MCA state: novm (MCA v2.1.0, API v1.0.0, Component v2.1.1)
MCA state: app (MCA v2.1.0, API v1.0.0, Component v2.1.1)
MCA state: dvm (MCA v2.1.0, API v1.0.0, Component v2.1.1)
MCA state: tool (MCA v2.1.0, API v1.0.0, Component v2.1.1)
MCA state: orted (MCA v2.1.0, API v1.0.0, Component v2.1.1)
MCA state: staged_orted (MCA v2.1.0, API v1.0.0, Component
v2.1.1)
MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v2.1.1)
MCA state: staged_hnp (MCA v2.1.0, API v1.0.0, Component
v2.1.1)
MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA coll: self (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA fcoll: static (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
v2.1.1)
MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
v2.1.1)
MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component
v2.1.1)
MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA io: romio314 (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA mtl: ofi (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA mtl: psm (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v2.1.1)
MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v2.1.1)
MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v2.1.1)
MCA pml: v (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
v2.1.1)
MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
v2.1.1)
MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v2.1.1)
MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v2.1.1)
MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
v2.1.1)
HOWEVER it's strange that you get "* Running in serial mode with MPI", I get instead a message saying it's properly running with 2 CPUs in parallel (but still very slow).
@giovannipizzi this is only where I directly called the siesta
executable (to get the debug info), not mpirun siesta
I note in this "siesta only" build, we are getting this at the start of the stdout:
[[14705,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: qmobile
Another transport will be used instead, although this may result in
lower performance.
NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
what does this mean, should we be worried, and also should look if this was also the case in 20.11.2a?
I've never seen this message before. Not in 20.11.2a nor in 21.05.1. My guess is that it does not come from siesta itself, but from OpenMPI. Still downloading the VM with siesta only...
I'm running with the last VM provided by @chrisjsewell (only siesta included) and I get the same behavior reported by Chris. The same warning of OpenMPI and more or less the same computation time (6.5 minutes for Si relaxation with precise protocol and 2 processors). So performances are a bit worse than version 20.11.2a but surely we do not see the huge problem of v21.05.1. Anyway I think that once more we have confirmation that there is something a bit "weird" going on with parallelisation in general. How is it possible that now a warning appears out of nowhere without any apparent change in the code/OpenMPI installation?
ok I basically have to merge this now, because I'm getting emails from Nicola lol. But we can still continue the conversation and debugging here
So I have now created a VM with siesta+fleur: https://drive.google.com/file/d/1eZ_2vmu5nY6dGDJ9sdIu3C6SmQrZgzyF/view?usp=sharing
Now it appears to be back to the long run time for siesta (and still has the openmpi warning)
How is it possible that now a warning appears out of nowhere without any apparent change in the code/OpenMPI installation?
No idea 🤷 but indeed it would be great to get to the bottom of this
Thanks Chris! I'm going to report here things that I've tested, and I suggest that also other tests are reported here so we collect all this information. I might keep updating this comment.
Works fine:
aiida-pseudo install pseudo-dojo -f jthxml -r SR -v 1.0
mpirun -np 2
?)verdi data psml uploadfamily /usr/local/share/siesta/psml-files-qm/nc-sr-04_pbe_standard/ nc-sr-04_pbe_standard_psml "pseudos from PseudoDojo"
time aiida-common-workflows launch relax -S Si -p fast siesta
works and takes 14.5s on my machinetime aiida-common-workflows launch relax -S Si -p precise siesta
works and takes 4m55s on my machine (runs in serial, without MPI in the submission script)time aiida-common-workflows launch eos -S Si -p fast fleur -n 1 1
works (runs in ~5m11s) even if the resulting points are all over the place, not on a parabola (but that's OK probably, see next point)time aiida-common-workflows launch eos -S Si -p moderate fleur -n 1 1
works (runs in ~45m10s) and the points are on a nice parabola.Works so-so:
time aiida-common-workflows launch relax -S Si -p fast -n 2 -- siesta
works but very slowly (3m36s, so in parallel it's 15x slower than in serial...). (Note that the code detects it's running in serial, it prints "Running on 2 nodes in parallel")Does not work:
time aiida-common-workflows launch relax -S Si -p precise -n 2 -- siesta
submits correctly withmpirun -np 2
, but siesta seems to be extremely slow. It got stuck for minutes at the step "Setting up quadratic distribution...", now it seems (almost) stuck at the next line "stepf: Fermi-Dirac step function". I killed it after ~9 minutes.aiida-common-workflows launch relax -S Si -p fast fleur
fails, error: error-fleur.txt [seems to be related to parallelism and how to distribute data]. (Running in serial works, see above)