Add Intel Xe Max Support for the CCSD module

omarkahmed commented 10 months ago

These commits add Intel Xe Max Support for the CCSD module. Acknowledgements include:

Nawal Copty Nitin Gawande Rakesh Krishnaiyer Abhinav Gaba Ravi Narayanaswamy Geoff Lowney Jeff Hammond

edoapra commented 10 months ago

@omarkahmed Could you please git rebase your fork before submitting the pull request so that there are not merge commits such as https://github.com/nwchemgit/nwchem/pull/912/commits/9150517f4a90e9695555a7b1d97250b21fdc523e?

omarkahmed commented 10 months ago

@edoapra , thanks, just fixed.

edoapra commented 10 months ago

@omarkahmed You're welcome. Have you you tested the fork used in this pull with github actions?

omarkahmed commented 10 months ago

@edoapra , I have not. Just started a run: https://github.com/omarkahmed/nwchem/actions/runs/6934884294

jeffhammond commented 10 months ago

I apologize for asking a dumb question, but I have forgotten a lot of this stuff. Can I test this on my Intel Tiger Lake iGPU (with FP64 emulation) or do I need Gen12 / discrete GPU for this? Thanks

omarkahmed commented 10 months ago

@edoapra , I will add some documentation (and see if there is any simplification). @jeffhammond , that's a good question. This is tested on the Intel server GPUs: https://www.intel.com/content/www/us/en/products/details/discrete-gpus/data-center-gpu/max-series.html . I haven't been testing this code on client/integrated GPUs. Expect there to be some issues, but will update shortly.

edoapra commented 10 months ago

The outcome of the github actions job using the Intel Compile with OpenMP seems to indicate that your makefile changes have broken the main SCF functionality.
https://github.com/omarkahmed/nwchem/actions/runs/6934884294/job/18863916565

Could you remove all your config/makefile.h changes and try the tests again?

Please try to keep changes to a minimum and only when needed

jeffhammond commented 10 months ago

@edoapra The OpenACC file just moved (shown below).

ccsd_trpdrv_offload.F was for Knights Corner and doesn't need to be compiled anymore. ccsd_trpdrv_bgp2.F has not been used in a decade or more.

ifdef USE_OPENACC_TRPDRV
  OBJ_OPTIMIZE += ccsd_trpdrv_openacc.o
  USES_BLAS    += ccsd_trpdrv_openacc.F
  FOPTIONS += -DUSE_OPENACC_TRPDRV
  ifeq ($(_FC),pgf90)
      FOPTIONS += -Mextend -acc -cuda -cudalib=cublas
  endif
  ifeq ($(_FC),gfortran)
      FOPTIONS += -ffree-form -fopenacc -lcublas
  endif
endif

omarkahmed commented 10 months ago

@edoapra , currently debugging the failure in the GA unit test. Hope to update with a solution soon. @jeffhammond , in addition the other two sources are preserved under relevant ifdefs:

ifeq ($(TARGET),BGP)
  OBJ_OPTIMIZE += ccsd_trpdrv_bgp2.o ccsd_tengy_bgp2.o ccsd_tengy_bgp.o
  USES_BLAS += ccsd_trpdrv_bgp2.F
  LIB_DEFINES += -DBGP
endif
ifdef USE_MIC_TRPDRV
  OBJ_OPTIMIZE += ccsd_trpdrv_offload.o
  USES_BLAS    += ccsd_trpdrv_offload.F
  LIB_DEFINES += -DUSE_MIC_TRPDRV
endif

omarkahmed commented 10 months ago

@edoapra , I'm trying to reproduce the failure locally. Environment should be fairly similar, with ifx from oneapi 2023.2.1 + gcc 9.4.0, but it doesn't seem to occur for me. One delta is that I'm using Ubuntu 22.04 instead of 20.04. As a consequence the OS distribution version of gcc is newer (11.4.0), and I have to use a gnu 9.4.0 environment module on top (so maybe there are 11.4.0-related artfifacts). For reference, here is my run log.

BLAS_SIZE is  8
BLASOPT is
BUILD_OPENBLAS is
DISTR is SION_ID=22.04
NWCHEM_TOP is /nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem
ifx version 2023.2.0
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/hpc_software/compilers/gnu/9.4.0/libexec/gcc/x86_64-pc-linux-gnu/9.4.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-9.4.0/configure --prefix=/opt/hpc_software/compilers/gnu/9.4.0 --enable-languages=c,c++,fortran,go --disable-multilib
Thread model: posix
gcc version 9.4.0 (GCC)
from nwchem.bashrc
BLAS_SIZE =  8
SCALAPACK_SIZE =  8
NWCHEM_EXECUTABLE is /nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem/.cachedir/binaries/LINUX64/nwchem_x86_64_tinyqmpw-python_intel_ifx
VT_MPI=impi4
I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.10.0
COMPILER_PATH=/opt/hpc_software/compilers/gnu/9.4.0
USE_OPENMP=2
SETVARS_COMPLETED=1
CONDA_PROMPT_MODIFIER=(intelpython-python3.9)
CMPLR_ROOT=/opt/intel/oneapi/compiler/2023.2.1
ARMCI_NETWORK=MPI-PR
I_MPI_F90=ifx
USE_MPI=y
OMP_NUM_THREADS=2
OMP_STACKSIZE=32M
MPI_IMPL=intel
=== ls binaries cache ===
total 58776
-rwxr-xr-x 1 omarahme intelall 59104808 Nov 21 14:59  nwchem_x86_64_tinyqmpw-python_intel_ifx
-rw-r--r-- 1 omarahme intelall     2923 Nov 21 15:44  h2o_opt_dat.cfock
-rw-r--r-- 1 omarahme intelall     3667 Nov 21 15:44  h2o_opt_dat.movecs
-rw-r--r-- 1 omarahme intelall       96 Nov 21 15:44  h2o_opt_dat.c
-rw-r--r-- 1 omarahme intelall       48 Nov 21 15:44  h2o_opt_dat.zmat
-rw-r--r-- 1 omarahme intelall      240 Nov 21 15:44  h2o_opt_dat.b
-rw-r--r-- 1 omarahme intelall      240 Nov 21 15:44 'h2o_opt_dat.b^-1'
-rw-r--r-- 1 omarahme intelall       96 Nov 21 15:44  h2o_opt_dat.p
-rw-r--r-- 1 omarahme intelall       96 Nov 21 15:44  h2o_opt_dat.drv.hess
-rw-r--r-- 1 omarahme intelall  1041002 Nov 21 15:44  h2o_opt_dat.db
=========================
no using sleep loop

 Running tests/dft_he2+/dft_he2+

     cleaning scratch
     copying input and verified output files
     running nwchem (/nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem/.cachedir/binaries/LINUX64/nwchem_x86_64_tinyqmpw-python_intel_ifx)  with 2 processors

     verifying output ... OK

OK

 Running tests/bas_details/bas_details

     cleaning scratch
     copying input and verified output files
     running nwchem (/nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem/.cachedir/binaries/LINUX64/nwchem_x86_64_tinyqmpw-python_intel_ifx)  with 2 processors

     verifying output ... OK

OK

 Running tests/adft_he2+/adft_he2+

     cleaning scratch
     copying input and verified output files
     running nwchem (/nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem/.cachedir/binaries/LINUX64/nwchem_x86_64_tinyqmpw-python_intel_ifx)  with 2 processors

     verifying output ... OK

OK

 Running tests/prop_mep_gcube/prop_mep_gcube

     cleaning scratch
     copying input and verified output files
     running nwchem (/nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem/.cachedir/binaries/LINUX64/nwchem_x86_64_tinyqmpw-python_intel_ifx)  with 2 processors

     verifying output ... OK

OK

 Running tests/cosmo_h2o_dft/cosmo_h2o_dft

     cleaning scratch
     copying input and verified output files
     running nwchem (/nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem/.cachedir/binaries/LINUX64/nwchem_x86_64_tinyqmpw-python_intel_ifx)  with 2 processors

     verifying output ... OK

OK

 Running tests/pyqa3/pyqa3

     cleaning scratch
     copying input and verified output files
     running nwchem (/nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem/.cachedir/binaries/LINUX64/nwchem_x86_64_tinyqmpw-python_intel_ifx)  with 2 processors

     verifying output ... OK

OK

 Running tests/dft_siosi3/dft_siosi3

     cleaning scratch
     copying input and verified output files
     running nwchem (/nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem/.cachedir/binaries/LINUX64/nwchem_x86_64_tinyqmpw-python_intel_ifx)  with 2 processors

     verifying output ... OK

 Running tests/h2o_opt/h2o_opt

     cleaning scratch
     copying input and verified output files
     running nwchem (/nfs/site/home/omarahme/git-repos/nwchem.rebase/nwchem/.cachedir/binaries/LINUX64/nwchem_x86_64_tinyqmpw-python_intel_ifx)  with 2 processors

     verifying output ... OK

OK

Will give Ubuntu 20.04 a shot as well to see if I can reproduce. However, I'm wondering if this unit test makes sense with this older version of gcc + ifx (instead of icx + ifx from the same oneapi release)?

edoapra commented 10 months ago

@omarkahmed I have created a new branch under my NWChem fork that keeps the changes to src/config/makefile.h to a minimum but that should, in principle, keep all the new functionality you have introduced for offloading on Xe Max. These branch does pass all the github actions tests.
https://github.com/edoapra/nwchem/tree/openmp-intel-gpu_cleanmakefile Could you please try to clone it and test it to see if I have preserved all your functionality?

The root cause of all these compilation issue was due to the fact that you applied your changes to the makefile.h portion identified with _FC=ifxold. These part is by now obsolete. All the ifx parts are now identified by USE_IFX, instead to keep the changes to a minimum. If your tests are successful, I will remove the ifxold makefile.h part to avoid future issues similar to the present one.

omarkahmed commented 10 months ago

@edoapra , thanks! I added a patch to your branch at https://github.com/omarkahmed/nwchem/tree/openmp-intel-gpu_cleanmakefile+fix to ensure that my test case builds and runs, and am running the github action here: https://github.com/omarkahmed/nwchem/actions/runs/7005906029/job/19056591830 . Will update the PR with this branch if all cases pass.

edoapra commented 10 months ago

@edoapra , thanks! I added a patch to your branch at https://github.com/omarkahmed/nwchem/tree/openmp-intel-gpu_cleanmakefile+fix to ensure that my test case builds and runs, and am running the github action here: https://github.com/omarkahmed/nwchem/actions/runs/7005906029/job/19056591830 . Will update the PR with this branch if all cases pass.

Sounds good. If things work, is it OK for you if I git push force the content of my new branch (including your latest changes) into omarkahmed:omarkahmed/openmp-intel-gpu so that we continue this same pull request?

omarkahmed commented 10 months ago

@edoapra , absolutely.

omarkahmed commented 10 months ago

Hi @edoapra , looks like the GA unit tests are looking good, and I confirm that my internal tests are also passing.

edoapra commented 10 months ago

@edoapra , absolutely.

Done

nwchemgit / nwchem

Add Intel Xe Max Support for the CCSD module #912