libAtoms / QUIP

libAtoms/QUIP molecular dynamics framework: https://libatoms.github.io
346 stars 122 forks source link

QUIP/GAP MPI question #482

Closed MES-physics closed 1 year ago

MES-physics commented 1 year ago

Hi, I've been reading the conversations from earlier in the year about running gap_fit with MPI capability. Is it done by simply installing the new version available now, and entering the command "mpi_blocksize= ? " in the gap_fit commands,( and requesting my nodes as usual, and loading the MPI module), on my slurm submission? And, also, what blocksizes are recommended relative to numbers of atoms for training? Is it still necessary to do a serial run for the sparse points first? Thanks!

Sideboard commented 1 year ago

Sorry, we need to get some instructions into the documentation.

You have to compile it with MPI (probably just choose the right QUIP_ARCH) and enable ScaLAPACK in make config. You still have to provide the sparse points in extra files, which can be done running a quick serial run with sparsify_only_no_fit=T first, using the $QUIP_ROOT/bin/gap_prepare_sparsex_input.py to create the *.input files, and adjusting the gap strings in the config to point to the created files. Then it's just a matter of using srun or mpirun -n.

You probably don't have to touch the mpi_blocksize* options anymore, except if you want to experiment. (Note then that the column blocksize has a huge impact on the size of the working array for ScaLAPACK).

MES-physics commented 1 year ago

Oh Oh.. Now I get errors after "make", after enabling scaLAPACK in the config questions. What do you think?

Making GAP programs


rm -f /home/m/QUIPMPI/QUIP/build/linux_x86_64_gfortran_openmpi+openmp/Makefile cp /home/m/QUIPMPI/QUIP/src/GAP/Makefile /home/m/QUIPMPI/QUIP/build/linux_x86_64_gfortran_openmpi+openmp/Makefile make -C /home/m/QUIPMPI/QUIP/build/linux_x86_64_gfortran_openmpi+openmp QUIP_ROOT=/home/m/QUIPMPI/QUIP VPATH=/home/m/QUIPMPI/QUIP/src/GAP -I/home/m/QUIPMPI/QUIP -I/home/m/QUIPMPI/QUIP/arch Programs make[1]: Entering directory '/home/m/QUIPMPI/QUIP/build/linux_x86_64_gfortran_openmpi+openmp' mpif90 -x f95-cpp-input -ffree-line-length-none -ffree-form -fno-second-underscore -fPIC -fno-realloc-lhs -fopenmp -I/home/m/QUIPMPI/QUIP/src/libAtoms -I/home/m/QUIPMPI/QUIP/src/fox/objs.linux_x86_64_gfortran_openmpi+openmp/finclude -O3 -DGETARG_F2003 -DGETENV_F2003 -DGFORTRAN -DFORTRAN_UNDERSCORE -D_MPI -D'GIT_VERSION="https://github.com/libAtoms/QUIP.git,v0.9.10-1-g2643d91d7-dirty"' -D'GAP_VERSION=1663247896' -D'QUIP_ARCH="linux_x86_64_gfortran_openmpi+openmp"' -D'SIZEOF_FORTRAN_T=2' -DHAVE_GAP -DHAVE_TB -DHAVE_PRECON -DHAVE_QR -DSCALAPACK -DHAVE_CP2K -DDESCRIPTORS_NONCOMMERCIAL -c /home/m/QUIPMPI/QUIP/src/GAP/gap_fit.f95 -o gap_fit.o mpif90 -o gap_fit gap_fit.o libgapfit.a -L. -lquiputils -lquip_core -lgap -latoms -fopenmp -O3 -L/home/m/QUIPMPI/QUIP/src/fox/objs.linux_x86_64_gfortran_openmpi+openmp/lib -lFoX_sax -lFoX_wxml -lFoX_utils -lFoX_common -lFoX_fsys -llapack -lblas
./libatoms.a(ScaLAPACK.o): In function __scalapack_module_MOD_get_lwork_pdormqr_i32o64': ScaLAPACK.f95:(.text+0x522): undefined reference toindxg2p' ScaLAPACK.f95:(.text+0x549): undefined reference to `indxg2p' ScaLAPACK.f95:(.text+0x57a): undefined reference to numroc_' ScaLAPACK.f95:(.text+0x5b1): undefined reference tonumroc' ScaLAPACK.f95:(.text+0x64e): undefined reference to `indxg2p' ScaLAPACK.f95:(.text+0x678): undefined reference to numroc_' ScaLAPACK.f95:(.text+0x68b): undefined reference toilcm' ScaLAPACK.f95:(.text+0x6bc): undefined reference to `numroc' ScaLAPACK.f95:(.text+0x6df): undefined reference to numroc_' ./libatoms.a(ScaLAPACK.o): In functionscalapack_module_MOD_get_lwork_pdgeqrfi32o64': ScaLAPACK.f95:(.text+0x80d): undefined reference to `indxg2p' ScaLAPACK.f95:(.text+0x82f): undefined reference to indxg2p_' ScaLAPACK.f95:(.text+0x866): undefined reference tonumroc' ScaLAPACK.f95:(.text+0x88e): undefined reference to `numroc' ./libatoms.a(ScaLAPACK.o): In function `scalapack_module_MOD_scalapack_toarray2d': ScaLAPACK.f95:(.text+0x9ff): undefined reference to `descinit' ScaLAPACK.f95:(.text+0xb7e): undefined reference to pdgeadd_' ./libatoms.a(ScaLAPACK.o): In functionscalapack_module_MOD_scalapack_toarray1d': ScaLAPACK.f95:(.text+0xd3d): undefined reference to `descinit' ScaLAPACK.f95:(.text+0xe99): undefined reference to pdgeadd_' ./libatoms.a(ScaLAPACK.o): In functionscalapack_module_MOD_scalapack_pdtrtrswrapper': ScaLAPACK.f95:(.text+0x1182): undefined reference to `pdtrtrs' ./libatoms.a(ScaLAPACK.o): In function `scalapack_module_MOD_scalapack_pdormqrwrapper': ScaLAPACK.f95:(.text+0x1512): undefined reference to `pdormqr' ScaLAPACK.f95:(.text+0x1754): undefined reference to pdormqr_' ./libatoms.a(ScaLAPACK.o): In functionscalapack_module_MOD_scalapack_pdgeqrfwrapper': ScaLAPACK.f95:(.text+0x1964): undefined reference to `pdgeqrf' ScaLAPACK.f95:(.text+0x1a7d): undefined reference to pdgeqrf_' ./libatoms.a(ScaLAPACK.o): In functionscalapack_module_MOD_scalapack_matrix_product_subzzz': ScaLAPACK.f95:(.text+0x2397): undefined reference to `pzgemm' ./libatoms.a(ScaLAPACK.o): In function `scalapack_module_MOD_scalapack_matrix_product_subddd': ScaLAPACK.f95:(.text+0x29c4): undefined reference to `pdgemm' ./libatoms.a(ScaLAPACK.o): In function __scalapack_module_MOD_scalapack_diagonalise_gen_c': ScaLAPACK.f95:(.text+0x3379): undefined reference topzhegvx' ScaLAPACK.f95:(.text+0x3735): undefined reference to `pzhegvx' ./libatoms.a(ScaLAPACK.o): In function __scalapack_module_MOD_scalapack_diagonalise_gen_r': ScaLAPACK.f95:(.text+0x4ad6): undefined reference topdsygvx' ScaLAPACK.f95:(.text+0x4e08): undefined reference to `pdsygvx' ./libatoms.a(ScaLAPACK.o): In function __scalapack_module_MOD_scalapack_diagonalise_c': ScaLAPACK.f95:(.text+0x5ecc): undefined reference topzheevx' ScaLAPACK.f95:(.text+0x6271): undefined reference to `pzheevx' ./libatoms.a(ScaLAPACK.o): In function __scalapack_module_MOD_scalapack_diagonalise_r': ScaLAPACK.f95:(.text+0x724f): undefined reference topdsyevx' ScaLAPACK.f95:(.text+0x7576): undefined reference to `pdsyevx' ./libatoms.a(ScaLAPACK.o): In function __scalapack_module_MOD_scalapack_inverse_c': ScaLAPACK.f95:(.text+0x81a1): undefined reference tonumroc' ScaLAPACK.f95:(.text+0x81c4): undefined reference to `numroc' ScaLAPACK.f95:(.text+0x820f): undefined reference to numroc_' ScaLAPACK.f95:(.text+0x8259): undefined reference topzgetrf' ScaLAPACK.f95:(.text+0x8343): undefined reference to `pzgetri' ScaLAPACK.f95:(.text+0x845c): undefined reference to pzgetri_' ScaLAPACK.f95:(.text+0x85b4): undefined reference tonumroc' ScaLAPACK.f95:(.text+0x860f): undefined reference to `numroc' ./libatoms.a(ScaLAPACK.o): In function __scalapack_module_MOD_scalapack_inverse_r': ScaLAPACK.f95:(.text+0x8c2f): undefined reference tonumroc' ScaLAPACK.f95:(.text+0x8c52): undefined reference to `numroc' ScaLAPACK.f95:(.text+0x8c9d): undefined reference to numroc_' ScaLAPACK.f95:(.text+0x8ce7): undefined reference topdgetrf' ScaLAPACK.f95:(.text+0x8dcf): undefined reference to `pdgetri' ScaLAPACK.f95:(.text+0x8ee8): undefined reference to pdgetri_' ScaLAPACK.f95:(.text+0x9044): undefined reference tonumroc' ScaLAPACK.f95:(.text+0x909f): undefined reference to `numroc' ./libatoms.a(ScaLAPACK.o): In function __scalapack_module_MOD_scalapack_init_matrix_desc': ScaLAPACK.f95:(.text+0x9b53): undefined reference tonumroc' ScaLAPACK.f95:(.text+0x9b76): undefined reference to `numroc' ScaLAPACK.f95:(.text+0x9bc1): undefined reference to descinit_' ./libatoms.a(ScaLAPACK.o): In function__scalapack_module_MOD_matrix_scalapack_info_coords_local_toglobal': ScaLAPACK.f95:(.text+0x9c2c): undefined reference to `indxl2g' ScaLAPACK.f95:(.text+0x9c4b): undefined reference to indxl2g_' ./libatoms.a(ScaLAPACK.o): In functionscalapack_module_MOD_matrix_scalapack_info_coords_global_tolocal': ScaLAPACK.f95:(.text+0xb97a): undefined reference to `infog2l' ./libatoms.a(ScaLAPACK.o): In function `scalapack_module_MOD_scalapack_finalise': ScaLAPACK.f95:(.text+0xc9cd): undefined reference to blacs_gridexit_' ./libatoms.a(ScaLAPACK.o): In function__scalapack_module_MOD_scalapack_initialise': ScaLAPACK.f95:(.text+0xcbf5): undefined reference to blacs_gridinit_' ScaLAPACK.f95:(.text+0xcc19): undefined reference toblacsgridinfo' collect2: error: ld returned 1 exit status make[1]: [Makefile:96: gap_fit] Error 1 make[1]: Leaving directory '/home/m/QUIPMPI/QUIP/build/linux_x86_64_gfortran_openmpi+openmp' make: [Makefile:197: gap_programs] Error 2

bernstei commented 1 year ago

You need to actually link to the scalapack libraries. You can add the relevant flags to MATH_LINKOPTS or EXTRA_LINKOPTS in build.${QUIP_ARCH}/Makefile.inc.

MES-physics commented 1 year ago

Can you please tell me what are the relevant flags? I see above, -DSCALAPACK ?
Others? thanks, I'm just beginning.

bernstei commented 1 year ago

ScaLAPACK is its own library. You need a copy (compatible with your mpi version), you'll typically end up with something like -lscalapack -lblacs, but the precise names of the libraries will depend on where you get scalapack. I mostly use MKL, which includes scalpack in addition to lapack and blas. Your link line looks like some linux-provided lapack and blas (perhaps openblas - lots of distributions use that), so you need some version of scalapack compiled and installed.

Sideboard commented 1 year ago

Is this on your own computer or on a cluster (which one)? What operating system do you use?

gabor1 commented 1 year ago

yes, the single atom energies look fine.

MES-physics commented 1 year ago

This is on a cluster at my school. ScaLapack is a module I can load, with a specific mpi version to go with it. So now I’m asking them how to link it to my Quip installation. I will try soon.

MES-physics commented 1 year ago

OK, now admin told me to do this. "If you use the Intel OneAPI module, it will add all these to your path. After logging in, run: module load OneAPI source $SETVARS This will change your compiler to intel and add tools such as mkl and impi to your environment." Then I put the flags into Makefile.inc : -lscalapack -lblacs Then I tried to do $make config , and it says

Makefile:34: *** "You need to define the architecture using the QUIP_ARCH variable. Check out the arch/ subdirectory.". Stop.

I assume this is the architecture I want: Makefile.linux_x86_64_gfortran_openmpi+openmp So, how to define it, and where do I put it?
Thanks for any clues.

bernstei commented 1 year ago

Looks like that module will change gfortran to intel fortran and (probably) openmpi to intelmpi. You'll need to see if there's some QUIP_ARCH with a corresponding QUIP/arch/Makefile.{QUIP_ARCH} for that combination, or you can make your own new Makefile.{QUIP_ARCH} if there is not. Presumably you'll want to use something like QUIP_ARCH=linux_x86_64_ifort_icc_intelmpi+openmp. There looks like there might be one with avon in the name that's at least similar to that.

MES-physics commented 1 year ago

Yes ! When I used the one with "avon" and set the environment as they said above, the make config and make installation worked. Thanks very much. My mistake was trying to load a scalapack module separately, which wasn't a version compatible with the OneAPI intel compiler. Scalapack seems to be included with this one.

BUT ... right after that all seemed to work, I tried to "make install quippy" and got this problem, after it seemed to be doing things for awhile. How to get the correct file? Was something missing? It looks like quippy is not compatible with something? Thanks for advice.

mv _quippy.cpython-39-x86_64-linux-gnu.so quippy/ mv: cannot stat '_quippy.cpython-39-x86_64-linux-gnu.so': No such file or directory make[1]: [Makefile:119: quippy/_quippy.cpython-39-x86_64-linux-gnu.so] Error 1 make[1]: Leaving directory '/home/m/QUIPMPI/QUIP/build/linux_x86_64_ifort_icc_avon_intelmpi' make: [Makefile:230: quippy] Error 2

Before that, a lot of these showed up:

Generating possibly empty wrappers" Maybe empty "_quippy-f2pywrappers.f" Constructing wrapper function "f90wrap_dictionary_add_array_i_a"... f90wrap_dictionary_add_array_i_a(this,key,value,len_bn,[overwrite])

And these types of warnings:

WARNING:f90wrap.transform:removing optional argument mpi_obj due to unsupported derived type type(mpi_context) WARNING:f90wrap.transform:removing optional argument mpi_obj due to unsupported derived type type(mpi_context) WARNING:f90wrap.transform:removing callback routine potential_simple_set_callback WARNING:f90wrap.transform:removing tb_type.tbsys as type type(tbsystem) unsupported WARNING:f90wrap.transform:removing tb_type.evals as type type(tbvector) unsupported WARNING:f90wrap.transform:removing tb_type.e_fillings as type type(tbvector) unsupported WARNING:f90wrap.transform:removing tb_type.f_fillings as type type(tbvector) unsupported WARNING:f90wrap.transform:removing tb_type.eval_f_fillings as type type(tbvector) unsupported WARNING:f90wrap.transform:removing tb_type.evecs as type type(tbmatrix) unsupported WARNING:f90wrap.transform:removing tb_type.dm as type type(tbmatrix) unsupported WARNING:f90wrap.transform:removing tb_type.hdm as type type(tbmatrix) unsupported WARNING:f90wrap.transform:removing tb_type.scaled_evecs as type type(tbmatrix) unsupported WARNING:f90wrap.transform:removing tb_type.mpi as type type(mpi_context) unsupported WARNING:f90wrap.transform:removing tb_type.gf as type type(greensfunctions) unsupported WARNING:f90wrap.transform:removing optional argument kpoints_obj due to unsupported derived type type(kpoints) WARNING:f90wrap.transform:removing optional argument mpi_obj due to unsupported derived type type(mpi_context)

MES-physics commented 1 year ago

More on the quippy build error, lines prior to last one showing error: Thanks for any advice.

running build running config_cc INFO: unifing config_cc, config, build_clib, build_ext, build commands --compiler options running config_fc INFO: unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options running build_src INFO: build_src INFO: building extension "_quippy" sources INFO: f2py options: [] INFO: adding './src.linux-x86_64-3.8/./src.linux-x86_64-3.8/fortranobject.c' to sources. INFO: adding './src.linux-x86_64-3.8/./src.linux-x86_64-3.8' to include_dirs. INFO: adding './src.linux-x86_64-3.8/_quippy-f2pywrappers.f' to sources. INFO: build_src: building npy-pkg config files running build_ext INFO: customize UnixCCompiler INFO: customize UnixCCompiler using build_ext INFO: customize IntelEM64TFCompiler INFO: Found executable /opt/intel/oneapi/mpi/2021.5.0/bin/mpiifort INFO: Found executable /opt/intel/oneapi/compiler/2022.0.1/linux/bin/intel64/ifort INFO: customize IntelEM64TFCompiler using build_ext mv _quippy.cpython-39-x86_64-linux-gnu.so quippy/ mv: cannot stat '_quippy.cpython-39-x86_64-linux-gnu.so': No such file or directory make[1]: [Makefile:119: quippy/_quippy.cpython-39-x86_64-linux-gnu.so] Error 1 make[1]: Leaving directory '/home/m/QUIPMPI/QUIP/build/linux_x86_64_ifort_icc_avon_intelmpi' make: [Makefile:230: quippy] Error 2

jameskermode commented 1 year ago

You should continue to build quippy without MPI using your previous QUIP_ARCH setting with OpenMP parallelisation only. MPI quippy builds are neither needed to run gap_fit with MPI nor supported (although it should be possible if there’s a good reason).

MES-physics commented 1 year ago

Am I supposed to do the "make config" over again? Then it can't find the flags -llapack -lblas. By the way, I built a different python environment for the MPI version, trying to put all this in a new directory, so how should these be set up? Thanks for helping!

Making Programs


rm -f /home/m/QUIPMPI/QUIP/build/linux_x86_64_gfortran_openmp/Makefile cp /home/m/QUIPMPI/QUIP/src/Programs/Makefile /home/m/QUIPMPI/QUIP/build/linux_x86_64_gfortran_openmp/Makefile make -C /home/m/QUIPMPI/QUIP/build/linux_x86_64_gfortran_openmp QUIP_ROOT=/home/m/QUIPMPI/QUIP VPATH=/home/m/QUIPMPI/QUIP/src/Programs -I/home/m/QUIPMPI/QUIP -I/home/m/QUIPMPI/QUIP/arch make[1]: Entering directory '/home/m/QUIPMPI/QUIP/build/linux_x86_64_gfortran_openmp' gfortran -o quip quip.o vacancy_map_mod.o -L. -lquiputils -lquip_core -lgap -latoms -fopenmp -O3 -L/home/m/QUIPMPI/QUIP/src/fox/objs.linux_x86_64_gfortran_openmp/lib -lFoX_sax -lFoX_wxml -lFoX_utils -lFoX_common -lFoX_fsys -llapack -lblas
/usr/bin/ld: cannot find -llapack /usr/bin/ld: cannot find -lblas collect2: error: ld returned 1 exit status make[1]: [Makefile:79: quip] Error 1 make[1]: Leaving directory '/home/m/QUIPMPI/QUIP/build/linux_x86_64_gfortran_openmp' make: [Makefile:155: Programs] Error 2

MES-physics commented 1 year ago

OK, I tried that. Am I supposed to do "make config" over again? If I try "make install quippy" without it, it tells me to. I have the MPI version in a new python environment, in a new directory. So how does quippy fit into it? When I tried "make install quippy" after the make config, it says it cannot find the -llapack and -lblas flags even though I did load gnu10 and openblas modules on the cluster as I did before for the non-MPI version. How should this quippy be available for the MPI version of the gap_fit program?
I don't know if I'm describing all this correctly. Thanks for helping!

jameskermode commented 1 year ago

You don’t need to build or install quippy at all for the MPI version of gap_fit. It’s Fortran only.

MES-physics commented 1 year ago

Oh, thanks, I'll try to move on now!