Closed omarkahmed closed 10 months ago
@omarkahmed Could you please git rebase your fork before submitting the pull request so that there are not merge commits such as https://github.com/nwchemgit/nwchem/pull/912/commits/9150517f4a90e9695555a7b1d97250b21fdc523e?
@edoapra , thanks, just fixed.
@omarkahmed You're welcome. Have you you tested the fork used in this pull with github actions?
@edoapra , I have not. Just started a run: https://github.com/omarkahmed/nwchem/actions/runs/6934884294
I apologize for asking a dumb question, but I have forgotten a lot of this stuff. Can I test this on my Intel Tiger Lake iGPU (with FP64 emulation) or do I need Gen12 / discrete GPU for this? Thanks
@edoapra , I will add some documentation (and see if there is any simplification). @jeffhammond , that's a good question. This is tested on the Intel server GPUs: https://www.intel.com/content/www/us/en/products/details/discrete-gpus/data-center-gpu/max-series.html . I haven't been testing this code on client/integrated GPUs. Expect there to be some issues, but will update shortly.
The outcome of the github actions job using the Intel Compile with OpenMP seems to indicate that your makefile changes have broken the main SCF functionality.
https://github.com/omarkahmed/nwchem/actions/runs/6934884294/job/18863916565
Could you remove all your config/makefile.h changes and try the tests again?
Please try to keep changes to a minimum and only when needed
@edoapra The OpenACC file just moved (shown below).
ccsd_trpdrv_offload.F was for Knights Corner and doesn't need to be compiled anymore. ccsd_trpdrv_bgp2.F has not been used in a decade or more.
ifdef USE_OPENACC_TRPDRV
OBJ_OPTIMIZE += ccsd_trpdrv_openacc.o
USES_BLAS += ccsd_trpdrv_openacc.F
FOPTIONS += -DUSE_OPENACC_TRPDRV
ifeq ($(_FC),pgf90)
FOPTIONS += -Mextend -acc -cuda -cudalib=cublas
endif
ifeq ($(_FC),gfortran)
FOPTIONS += -ffree-form -fopenacc -lcublas
endif
endif
@edoapra , currently debugging the failure in the GA unit test. Hope to update with a solution soon. @jeffhammond , in addition the other two sources are preserved under relevant ifdefs:
ifeq ($(TARGET),BGP)
OBJ_OPTIMIZE += ccsd_trpdrv_bgp2.o ccsd_tengy_bgp2.o ccsd_tengy_bgp.o
USES_BLAS += ccsd_trpdrv_bgp2.F
LIB_DEFINES += -DBGP
endif
ifdef USE_MIC_TRPDRV
OBJ_OPTIMIZE += ccsd_trpdrv_offload.o
USES_BLAS += ccsd_trpdrv_offload.F
LIB_DEFINES += -DUSE_MIC_TRPDRV
endif
@edoapra , I'm trying to reproduce the failure locally. Environment should be fairly similar, with ifx from oneapi 2023.2.1 + gcc 9.4.0, but it doesn't seem to occur for me. One delta is that I'm using Ubuntu 22.04 instead of 20.04. As a consequence the OS distribution version of gcc is newer (11.4.0), and I have to use a gnu 9.4.0 environment module on top (so maybe there are 11.4.0-related artfifacts). For reference, here is my run log.
BLAS_SIZE is 8
BLASOPT is
BUILD_OPENBLAS is
DISTR is SION_ID=22.04
NWCHEM_TOP is /nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem
ifx version 2023.2.0
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/hpc_software/compilers/gnu/9.4.0/libexec/gcc/x86_64-pc-linux-gnu/9.4.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-9.4.0/configure --prefix=/opt/hpc_software/compilers/gnu/9.4.0 --enable-languages=c,c++,fortran,go --disable-multilib
Thread model: posix
gcc version 9.4.0 (GCC)
from nwchem.bashrc
BLAS_SIZE = 8
SCALAPACK_SIZE = 8
NWCHEM_EXECUTABLE is /nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem/.cachedir/binaries/LINUX64/nwchem_x86_64_tinyqmpw-python_intel_ifx
VT_MPI=impi4
I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.10.0
COMPILER_PATH=/opt/hpc_software/compilers/gnu/9.4.0
USE_OPENMP=2
SETVARS_COMPLETED=1
CONDA_PROMPT_MODIFIER=(intelpython-python3.9)
CMPLR_ROOT=/opt/intel/oneapi/compiler/2023.2.1
ARMCI_NETWORK=MPI-PR
I_MPI_F90=ifx
USE_MPI=y
OMP_NUM_THREADS=2
OMP_STACKSIZE=32M
MPI_IMPL=intel
=== ls binaries cache ===
total 58776
-rwxr-xr-x 1 omarahme intelall 59104808 Nov 21 14:59 nwchem_x86_64_tinyqmpw-python_intel_ifx
-rw-r--r-- 1 omarahme intelall 2923 Nov 21 15:44 h2o_opt_dat.cfock
-rw-r--r-- 1 omarahme intelall 3667 Nov 21 15:44 h2o_opt_dat.movecs
-rw-r--r-- 1 omarahme intelall 96 Nov 21 15:44 h2o_opt_dat.c
-rw-r--r-- 1 omarahme intelall 48 Nov 21 15:44 h2o_opt_dat.zmat
-rw-r--r-- 1 omarahme intelall 240 Nov 21 15:44 h2o_opt_dat.b
-rw-r--r-- 1 omarahme intelall 240 Nov 21 15:44 'h2o_opt_dat.b^-1'
-rw-r--r-- 1 omarahme intelall 96 Nov 21 15:44 h2o_opt_dat.p
-rw-r--r-- 1 omarahme intelall 96 Nov 21 15:44 h2o_opt_dat.drv.hess
-rw-r--r-- 1 omarahme intelall 1041002 Nov 21 15:44 h2o_opt_dat.db
=========================
no using sleep loop
Running tests/dft_he2+/dft_he2+
cleaning scratch
copying input and verified output files
running nwchem (/nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem/.cachedir/binaries/LINUX64/nwchem_x86_64_tinyqmpw-python_intel_ifx) with 2 processors
verifying output ... OK
OK
Running tests/bas_details/bas_details
cleaning scratch
copying input and verified output files
running nwchem (/nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem/.cachedir/binaries/LINUX64/nwchem_x86_64_tinyqmpw-python_intel_ifx) with 2 processors
verifying output ... OK
OK
Running tests/adft_he2+/adft_he2+
cleaning scratch
copying input and verified output files
running nwchem (/nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem/.cachedir/binaries/LINUX64/nwchem_x86_64_tinyqmpw-python_intel_ifx) with 2 processors
verifying output ... OK
OK
Running tests/prop_mep_gcube/prop_mep_gcube
cleaning scratch
copying input and verified output files
running nwchem (/nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem/.cachedir/binaries/LINUX64/nwchem_x86_64_tinyqmpw-python_intel_ifx) with 2 processors
verifying output ... OK
OK
Running tests/cosmo_h2o_dft/cosmo_h2o_dft
cleaning scratch
copying input and verified output files
running nwchem (/nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem/.cachedir/binaries/LINUX64/nwchem_x86_64_tinyqmpw-python_intel_ifx) with 2 processors
verifying output ... OK
OK
Running tests/pyqa3/pyqa3
cleaning scratch
copying input and verified output files
running nwchem (/nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem/.cachedir/binaries/LINUX64/nwchem_x86_64_tinyqmpw-python_intel_ifx) with 2 processors
verifying output ... OK
OK
Running tests/dft_siosi3/dft_siosi3
cleaning scratch
copying input and verified output files
running nwchem (/nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem/.cachedir/binaries/LINUX64/nwchem_x86_64_tinyqmpw-python_intel_ifx) with 2 processors
verifying output ... OK
Running tests/h2o_opt/h2o_opt
cleaning scratch
copying input and verified output files
running nwchem (/nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem/.cachedir/binaries/LINUX64/nwchem_x86_64_tinyqmpw-python_intel_ifx) with 2 processors
verifying output ... OK
OK
Will give Ubuntu 20.04 a shot as well to see if I can reproduce. However, I'm wondering if this unit test makes sense with this older version of gcc + ifx (instead of icx + ifx from the same oneapi release)?
@omarkahmed I have created a new branch under my NWChem fork that keeps the changes to src/config/makefile.h to a minimum but that should, in principle, keep all the new functionality you have introduced for offloading on Xe Max.
These branch does pass all the github actions tests.
https://github.com/edoapra/nwchem/tree/openmp-intel-gpu_cleanmakefile
Could you please try to clone it and test it to see if I have preserved all your functionality?
The root cause of all these compilation issue was due to the fact that you applied your changes to the makefile.h portion identified with _FC=ifxold
. These part is by now obsolete. All the ifx
parts are now identified by USE_IFX
, instead to keep the changes to a minimum. If your tests are successful, I will remove the ifxold
makefile.h part to avoid future issues similar to the present one.
@edoapra , thanks! I added a patch to your branch at https://github.com/omarkahmed/nwchem/tree/openmp-intel-gpu_cleanmakefile+fix to ensure that my test case builds and runs, and am running the github action here: https://github.com/omarkahmed/nwchem/actions/runs/7005906029/job/19056591830 . Will update the PR with this branch if all cases pass.
@edoapra , thanks! I added a patch to your branch at https://github.com/omarkahmed/nwchem/tree/openmp-intel-gpu_cleanmakefile+fix to ensure that my test case builds and runs, and am running the github action here: https://github.com/omarkahmed/nwchem/actions/runs/7005906029/job/19056591830 . Will update the PR with this branch if all cases pass.
Sounds good. If things work, is it OK for you if I git push force the content of my new branch (including your latest changes) into omarkahmed:omarkahmed/openmp-intel-gpu so that we continue this same pull request?
@edoapra , absolutely.
Hi @edoapra , looks like the GA unit tests are looking good, and I confirm that my internal tests are also passing.
@edoapra , absolutely.
Done
These commits add Intel Xe Max Support for the CCSD module. Acknowledgements include:
Nawal Copty Nitin Gawande Rakesh Krishnaiyer Abhinav Gaba Ravi Narayanaswamy Geoff Lowney Jeff Hammond