Open chrisjsewell opened 3 years ago
using the test Docker (on OSX) from #13, i.e. with libpnetcdf-dev
installed:
abinit -b
root@instance:/# abinit -b
DATA TYPE INFORMATION:
REAL: Data type name: REAL(DP)
Kind value: 8
Precision: 15
Smallest nonnegligible quantity relative to 1: 0.22204460E-015
Smallest positive number: 0.22250739E-307
Largest representable number: 0.17976931E+309
INTEGER: Data type name: INTEGER(default)
Kind value: 4
Bit size: 32
Largest representable number: 2147483647
LOGICAL: Data type name: LOGICAL
Kind value: 4
CHARACTER: Data type name: CHARACTER Kind value: 1
==== Using MPI-2 specifications ====
MPI-IO support is ON
xmpi_tag_ub ................ 2147483647
xmpi_bsize_ch .............. 1
xmpi_bsize_int ............. 4
xmpi_bsize_sp .............. 4
xmpi_bsize_dp .............. 8
xmpi_bsize_spc ............. 8
xmpi_bsize_dpc ............. 16
xmpio_bsize_frm ............ 4
xmpi_address_kind .......... 8
xmpi_offset_kind ........... 8
MPI_WTICK .................. 1.0000000000000001E-009
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CPP options activated during the build:
CC_GNU CXX_GNU FC_GNU
HAVE_FC_ALLOCATABLE_DT... HAVE_FC_ASYNC HAVE_FC_BACKTRACE
HAVE_FC_COMMAND_ARGUMENT HAVE_FC_COMMAND_LINE HAVE_FC_CONTIGUOUS
HAVE_FC_CPUTIME HAVE_FC_EXIT HAVE_FC_FLUSH
HAVE_FC_GAMMA HAVE_FC_GETENV HAVE_FC_IEEE_ARITHMETIC
HAVE_FC_IEEE_EXCEPTIONS HAVE_FC_INT_QUAD HAVE_FC_IOMSG
HAVE_FC_ISO_C_BINDING HAVE_FC_ISO_FORTRAN_2008 HAVE_FC_LONG_LINES
HAVE_FC_MOVE_ALLOC HAVE_FC_ON_THE_FLY_SHAPE HAVE_FC_PRIVATE
HAVE_FC_PROTECTED HAVE_FC_SHIFTLR HAVE_FC_STREAM_IO
HAVE_FC_SYSTEM HAVE_FORTRAN2003 HAVE_HDF5
HAVE_HDF5_MPI HAVE_LIBPAW_ABINIT HAVE_LIBTETRA_ABINIT
HAVE_LIBXC HAVE_MPI HAVE_MPI2
HAVE_MPI_IALLGATHER HAVE_MPI_IALLREDUCE HAVE_MPI_IALLTOALL
HAVE_MPI_IALLTOALLV HAVE_MPI_IBCAST HAVE_MPI_IGATHERV
HAVE_MPI_INTEGER16 HAVE_MPI_IO HAVE_MPI_TYPE_CREATE_S...
HAVE_NETCDF HAVE_NETCDF_FORTRAN HAVE_NETCDF_FORTRAN_MPI
HAVE_NETCDF_MPI HAVE_OS_LINUX HAVE_TIMER_ABINIT
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
=== Build Information ===
Version : 9.2.1
Build target : x86_64_linux_gnu7.5
Build date : 20210413
=== Compiler Suite ===
C compiler : gnu7.5
C++ compiler : gnu7.5
Fortran compiler : gnu7.5
CFLAGS : -g -O2
CXXFLAGS : -g -O2
FCFLAGS : -g -ffree-line-length-none
FC_LDFLAGS :
=== Optimizations ===
Debug level : @abi_debug_flavor@
Optimization level : @abi_optim_flavor@
Architecture : unknown_unknown
=== Multicore ===
Parallel build : yes
Parallel I/O : yes
openMP support :
GPU support :
=== Connectors / Fallbacks ===
LINALG flavor : netlib
FFT flavor : goedecker
HDF5 : yes
NetCDF : yes
NetCDF Fortran : yes
LibXC : yes
Wannier90 : no
=== Experimental features ===
Exports :
GW double-precision :
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Default optimizations:
-O2
Optimizations for 43_ptgroups:
-O0
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
nc-config --prefix
/usr
nc-config —all
root@instance:/# nc-config --all
This netCDF 4.6.0 has been built with the following features:
--cc -> /usr/bin/cc
--cflags -> -I/usr/include -I/usr/include/hdf5/serial
--libs -> -L/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu/hdf5/serial -lnetcdf -lhdf5_hl -lhdf5 -lpthread -lsz -lz -ldl -lm -lcurl
--has-c++ -> no
--cxx ->
--has-c++4 -> no
--cxx4 ->
--has-fortran-> yes
--fc -> gfortran
--fflags -> -I/usr/include
--flibs -> -L/usr/lib -lnetcdff -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -lnetcdf -lnetcdf
--has-f90 -> no
--has-f03 -> yes
--has-dap -> yes
--has-dap2 -> yes
--has-dap4 -> yes
--has-nc2 -> yes
--has-nc4 -> yes
--has-hdf5 -> yes
--has-hdf4 -> no
--has-logging-> no
--has-pnetcdf-> no
--has-szlib -> no
--has-cdf5 -> no
--has-parallel-> no
--prefix -> /usr
--includedir-> /usr/include
--libdir -> /usr/lib/x86_64-linux-gnu
--version -> netCDF 4.6.0
nf-config —all
root@instance:/# nf-config --all
This netCDF-Fortran 4.4.4 has been built with the following features:
--cc -> gcc
--cflags -> -I/usr/include -Wdate-time -D_FORTIFY_SOURCE=2
--fc -> gfortran
--fflags -> -I/usr/include
--flibs -> -L/usr/lib -lnetcdff -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -lnetcdf -lnetcdf
--has-f90 -> no
--has-f03 -> yes
--has-nc2 -> yes
--has-nc4 -> yes
--prefix -> /usr
--includedir-> /usr/include
--version -> netCDF-Fortran 4.4.4
pnetcdf-config --all
root@instance:/# pnetcdf-config --all
This parallel-netcdf 1.9.0 has been built with the following features:
--cc -> /usr/bin/mpicc
--cflags -> -g -O2 -fdebug-prefix-map=/build/pnetcdf-iIyKNf/pnetcdf-1.9.0=. -fstack-protector-strong -Wformat -Werror=format-security
--cppflags -> -Wdate-time -D_FORTIFY_SOURCE=2
--ldflags -> -Wl,-Bsymbolic-functions -Wl,-z,relro
--libs ->
--has-c++ -> yes
--cxx -> /usr/bin/mpicxx
--cxxflags -> -g -O2 -fdebug-prefix-map=/build/pnetcdf-iIyKNf/pnetcdf-1.9.0=. -fstack-protector-strong -Wformat -Werror=format-security
--has-fortran -> yes
--f77 -> /usr/bin/mpif77
--fflags -> -g -O2 -fdebug-prefix-map=/build/pnetcdf-iIyKNf/pnetcdf-1.9.0=. -fstack-protector-strong
--fc -> /usr/bin/mpif90
--fcflags -> -g -O2 -fdebug-prefix-map=/build/pnetcdf-iIyKNf/pnetcdf-1.9.0=. -fstack-protector-strong
--relax-coord-bound -> disabled
--in-place-swap -> enabled
--erange-fill -> enabled
--subfiling -> disabled
--large-req -> disabled
--null-byte-header-padding -> disabled
--debug -> disabled
--prefix -> /usr
--includedir -> /usr/include
--libdir -> /usr/lib/x86_64-linux-gnu
--version -> parallel-netcdf 1.9.0
--release-date -> 19 Dec 2017
--config-date ->
Eurgh I give up with this rubbish lol:
https://docs.abinit.org/INSTALL_Ubuntu/ says you simply install libpnetcdf-dev
, well that certainly does not seem to be the case.
Including this in the apt install
, still gives you --has-pnetcdf-> no
and --has-parallel-> no
for netcdf.
Note it also says it is linked to /usr/lib/x86_64-linux-gnu/hdf5/serial
, even though we have installed libhdf5-openmpi-dev
, and so /usr/lib/x86_64-linux-gnu/hdf5/openmpi
is available.
Does this mean that we also have to build netcdf from source?
If so do we actually need pnetcdf, via --enable-pnetcdf
(see https://parallel-netcdf.github.io/) or can we just link to the openmpi hdf5?
Let's try to simplify things a bit and mainly focus on the hard-requirements i.e. the libs that allow users to run standard GS/DFPT calculations in parallel with MPI and produce (small) netcdf files that can be used by python tools such as AbiPy for visualization purposes (e.g. band structure plots).
The first question is: what happens if you try run the input file that, in the previous build, was aborting with a stack smashing error when calling MPI_FILE_OPEN?
Do you still have the same error?
If this first test completes successfully, I would say that the fact that your netcdf library does not support parallel-IO (-has-parallel-> no
) it's not a big deal.
Basic MPI-IO capabilities provided by the MPI library are enough for standard calculations.
In other words, Abinit will be able to write/read Fortran binary files in parallel using MPI-IO and stream IO
(no netcdf/hdf5 stuff is required here).
If the error persists, we have a serious problem.
As I explained in the previous post, some of these ubuntu libraries are compiled with fstack-protector
and/or -D_FORTIFY_SOURCE=2
.
For instance, I see:
root@instance:/# nf-config --all
This netCDF-Fortran 4.4.4 has been built with the following features:
--cc -> gcc
--cflags -> -I/usr/include -Wdate-time -D_FORTIFY_SOURCE=2
--cflags -> -g -O2 -fdebug-prefix-map=/build/pnetcdf-iIyKNf/pnetcdf-1.9.0=. -fstack-protector-strong -Wformat -Werror=format-security
--cppflags -> -Wdate-time -D_FORTIFY_SOURCE=2
so I assume that also the MPI library was compiled with similar options.
From this man page:
With _FORTIFY_SOURCE set to 2, some more checking is added, but some conforming programs might fail.
In this case, the program should be intended as the MPI/netcdf/hdf5 library provided by apt so the stack smashing issue should be reported to the maintainers of these packages as Abinit is just a client of these libs and there's no way to disable these checks on our side.
As mentioned here
_FORTIFY_SOURCE level 2 is more secure, but is a slightly riskier compilation strategy; if you use it, make sure you have very strong regression tests for your compiled code to prove the compiler hasn't introduced any unexpected behaviour.
If the GS calculation seems to work in parallel, I would say we are on the right track and we only need to check whether other basic capabilities work as expected. At this point, you may want to use the runtests.py script to execute additional parts of the Test Suite, just to improve a bit the coverage:
cd ~abinit/tests
./runtests.py v1 v3 -j2 # run tests in the v1, v3 directories with 2 python threads (fast)
./runtests.py mpiio -n4 # run tests in the mpiio dir with 4 MPI procs (this will take more time)
If the tests are OK, I would say that the basic stuff works as expected. Running all the tests (~2000) will take much longer (~40 minutes with 6 cores) so you may want to skip this part.
PS:
Note that having a hdf5 library that supports MPI-IO (-has-parallel-> yes
) is not required by Abinit.
Besides parallel-netcdf (--has-pnetcdf-> yes
) refers to (yet another) implementation of parallel-IO netcdf that is still around for legacy reasons so I don't think you need it to build Abinit.
We (optionally) require an hdf5 library compiled with MPI-IO support but in this case the compilation/linking process becomes more complicated because the full software stack (netcdf Fortran/C, hdf5-c) must be compiled with the same MPI library used to compile abinit. That's the reason why our build system provides a shell script to compile the different libs from source using mpif90 and mpicc if the HPC center does not provide pre-installed modules that work out of the box.
The reason is that MPI is not just an API but it's also an implementation-dependent ABI so it's not possible to mix libs compiled with different compilers/MPI-implementations. It's not abinit that it complicated to build (although we always welcome comments and suggestions to facilitate the build process) It's the MPI software stack that is tricky and things become even more complicated when you have a library that depends on MPI. That's the reason why I'm suggesting to ignore the problem with hdf5+MP-IO and just focus on having a MPI library that does not crash when one tries to create a file.
Thanks for the reply @gmatteo
The first question is: what happens if you try run the input file that, in the previous build, was aborting with a stack smashing error when calling MPI_FILE_OPEN? Do you still have the same error?
I'm unclear why you think this will have changed to the previous build? Given that the only difference is the install of libpnetcdf-dev
, which (as noted above) does not appear to change anything.
At this point, you may want to use the runtests.py script to execute additional parts of the Test Suite, just to improve a bit the coverage:
This is already run in https://github.com/marvel-nccr/ansible-role-abinit/blob/master/tasks/tests.yml, and does not surface the stack smashing error
In this case, the program should be intended as the MPI/netcdf/hdf5 library provided by apt so the stack smashing issue should be reported to the maintainers of these packages as Abinit is just a client of these libs and there's no way to disable these checks on our side.
If you think there is an issue with the apt libraries fair enough, you are certainly more knowledgable in this area than me. But then this should be made clear in https://docs.abinit.org/INSTALL_Ubuntu/, where it specifically recommends using these apt libraries
It's not abinit that it complicated to build It's the MPI software stack That's the reason why I'm suggesting to ignore the problem with hdf5+MP-IO and just focus on having a MPI library that does not crash when one tries to create a file.
Again I would note here that this is not an issue for any of the other simulation codes with exactly the same MPI libraries.
Anyhow, I don't see a way forward on this install route into Ubuntu, so will pivot to look at the Conda install route
Just a quick comment/question to avoid possible misunderstandings: @chrisjsewell: after installing libpnetcdf-dev
, did you also re run the configure/make part of abinit from scratch, or you just installed the library with APT?
I think (@gmatteo correct me if I'm wrong) that installing that package makes it possible for the configure system to detect the library, and therefore compile abinit with the right support. However, just installing the library without recompiling abinit should not change the behaviour of the code (I think).
after installing libpnetcdf-dev, did you also re run the configure/make part of abinit from scratch
I didn't just install libpnetcdf-dev
, I added it to ansible role (#13) then tox converge
to create the entire Docker container from scratch, including apt install and compilation.
I haven't read through the thread, just wanted to provide a link to the build.sh
used to build the abinit conda package on conda-forge in case it helps: https://github.com/conda-forge/abinit-feedstock/blob/master/recipe/build.sh
Thanks, although I'd say that's not actually the salient point of the recipe (the make command is basically the same here), it's actually that the netcdf packages linked to are ones that have been compiled against the mpi library: https://github.com/conda-forge/abinit-feedstock/blob/master/recipe/meta.yaml#L57
Basically, I believe that to get parallel I/O here we also have to directly compile the netcdf libraries, rather than just installing them from apt.
Needless to say this introduces yet more complexity and build time to Quantum Mobile (for which abinit is already one of the longest running components), and so if we are anyway planning to move to Conda, I would rather spend my time on that rather than trying to add the netcdf compilation.
To also link to the conda effort: https://github.com/marvel-nccr/ansible-role-conda-codes/pull/1
Taken from email chain:
@giovannipizzi:
samuel ponce:
@chrisjsewell:
Matteo Giantomassi
cc also @sphuber