stfc / PSyclone

Domain-specific compiler and code transformation system for Finite Difference/Volume/Element Earth-system models in Fortran
BSD 3-Clause "New" or "Revised" License
104 stars 27 forks source link

Add NEMOv5 on GPU to integration tests #2629

Closed arporter closed 3 weeks ago

arporter commented 3 months ago

Currently, we only do passthrough for NEMOv5 in the integration tests. We need to add a GPU build and start monitoring performance.

Adi (Bristol) reports that he had to exclude the following files:

# compilation errors for psyclone
[[ "${FILENAME}" == 'sbcblk.f90'            ]] && ACTION='EXCLUDE' 
[[ "${FILENAME}" == 'stpctl.f90'            ]] && ACTION='EXCLUDE' 
# build errors for generated psyclone
[[ "${FILENAME}" == 'lib_mpp.f90'           ]] && ACTION='EXCLUDE'
[[ "${FILENAME}" == 'prtctl.f90'            ]] && ACTION='EXCLUDE'
[[ "${FILENAME}" == 'diadct.f90'            ]] && ACTION='EXCLUDE'
# runtime error (missing kmp_csupport)
[[ "${FILENAME}" == 'mppini.f90'            ]] && ACTION='EXCLUDE'
# loop mangling
[[ "${FILENAME}" == 'lbcnfd.f90'            ]] && ACTION='EXCLUDE'
[[ "${FILENAME}" == 'fldread.f90'            ]] && ACTION='EXCLUDE'

Possibly, stpctl.f90 is #2601 but I will check.

arporter commented 3 months ago

Adi has confirmed that it's a PSyclone failure, not a compile-time failure so this is not #2601.

arporter commented 3 months ago
Generation Error: generator: specified PSyclone transformation module 'psct-omp_gpu_trans'
raised the following exception during execution...
{
      File "/home/dn-sad1/rds/hpc-work/ngarch/nemo/cfgs/BENCH_ST/BLD_SCT_PSYCLONE/psct-omp_gpu_trans.py", line 123, in trans
    omp_loop_trans.apply(loop, options={"force": True})
      File "/home/dn-sad1/.local/lib/python3.11/site-packages/psyclone/psyir/transformations/omp_loop_trans.py", line 265, in apply
    super().apply(node, options)
      File "/home/dn-sad1/.local/lib/python3.11/site-packages/psyclone/psyir/transformations/parallel_loop_trans.py", line 195, in apply
    self.validate(node, options=options)
      File "/home/dn-sad1/.local/lib/python3.11/site-packages/psyclone/psyir/transformations/parallel_loop_trans.py", line 108, in validate
    super().validate(node, options=options)
      File "/home/dn-sad1/.local/lib/python3.11/site-packages/psyclone/psyir/transformations/loop_trans.py", line 113, in validate
    raise TransformationError(
    psyclone.psyir.transformations.transformation_error.TransformationError: Transformation Error: Nodes of type 'CodeBlock' cannot be enclosed

I see that we're "using the force" in the apply call so perhaps the option isn't being carried through?

sergisiso commented 3 months ago

Can we see the script, the error looks fine, it just says it can not parallelise something with a CodeBlock. The script should try/except it and ignore it.

arporter commented 3 months ago

I think (from yesterday) that Adi is just using the standard OMP offload script from the examples directory.

sergisiso commented 3 months ago

But that one has a try/except for Transformation errors, it should be fine with this error on this apply. Also it only uses the "force" option for very specific subroutine names, but in the backtrace it's hardcoded "omp_loop_trans.apply(loop, options={"force": True})" which is not what we do

addy419 commented 3 months ago

Hi @sergisiso @arporter here's the pscy-omp_gpu_trans.py

#!/usr/bin/env python
# -----------------------------------------------------------------------------
# BSD 3-Clause License
#
# Copyright (c) 2021-2024, Science and Technology Facilities Council.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice, this
#   list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright notice,
#   this list of conditions and the following disclaimer in the documentation
#   and/or other materials provided with the distribution.
#
# * Neither the name of the copyright holder nor the names of its
#   contributors may be used to endorse or promote products derived from
#   this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
# FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
# COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
# INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
# BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
# ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
# -----------------------------------------------------------------------------
# Authors: S. Siso, STFC Daresbury Lab

''' PSyclone transformation script showing the introduction of OpenMP for GPU
directives into Nemo code. '''

from psyclone.psyGen import TransInfo
from psyclone.psyir.nodes import (
    Loop, Directive, Assignment, OMPAtomicDirective)
from psyclone.psyir.transformations import OMPTargetTrans
from psyclone.transformations import OMPDeclareTargetTrans, TransformationError

from utils import insert_explicit_loop_parallelism, normalise_loops, \
    enhance_tree_information, add_profiling

PROFILING_ENABLED = False

def trans(psy):
    ''' Add OpenMP Target and Loop directives to all loops, including the
    implicit ones, to parallelise the code and execute it in an acceleration
    device.

    :param psy: the PSy object which this script will transform.
    :type psy: :py:class:`psyclone.psyGen.PSy`
    :returns: the transformed PSy object.
    :rtype: :py:class:`psyclone.psyGen.PSy`

    '''
    omp_target_trans = OMPTargetTrans()
    omp_loop_trans = TransInfo().get_trans_name('OMPLoopTrans')
    omp_loop_trans.omp_directive = "loop"

    print(f"Invokes found in {psy.name}:")
    for invoke in psy.invokes.invoke_list:
        print(invoke.name)

        if PROFILING_ENABLED:
            add_profiling(invoke.schedule.children)

        # TODO #2317: Has structure accesses that can not be offloaded and has
        # a problematic range to loop expansion of (1:1)
        if psy.name.startswith("psy_obs_"):
            print("Skipping", invoke.name)
            continue

        # This are functions with scalar bodies, we don't want to parallelise
        # them, but we could:
        # - Inine them
        # - Annotate them with 'omp declare target' and allow to call from gpus
        if invoke.name in ("q_sat", "sbc_dcy", "gamma_moist", "cd_neutral_10m",
                           "psi_h", "psi_m"):

            print("Skipping", invoke.name)
            continue

        enhance_tree_information(invoke.schedule)

        normalise_loops(
                invoke.schedule,
                hoist_local_arrays=True,
                convert_array_notation=True,
                loopify_array_intrinsics=True,
                convert_range_loops=True,
                hoist_expressions=True
        )

        # For performance in lib_fortran, mark serial routines as GPU-enabled
        if psy.name == "psy_lib_fortran_psy":
            if not invoke.schedule.walk(Loop):
                try:
                    # We need the 'force' option.
                    # SIGN_ARRAY_1D has a CodeBlock because of a WHERE without
                    # array notation. (TODO #717)
                    OMPDeclareTargetTrans().apply(invoke.schedule,
                                                  options={"force": True})
                except TransformationError as err:
                    print(err)

        # For now this is a special case for stpctl.f90 because it forces
        # loops to parallelise without many safety checks
        # TODO #2446: This needs to be generalised and probably be done
        # from inside the loop transformation when the race condition data
        # dependency is found.
        if psy.name == "psy_stpctl_psy":
            for loop in invoke.schedule.walk(Loop):
                # Skip if an outer loop is already parallelised
                if loop.ancestor(Directive):
                    continue
                omp_loop_trans.apply(loop, options={"force": True})
                omp_target_trans.apply(loop.parent.parent)
                assigns = loop.walk(Assignment)
                if len(assigns) == 1 and assigns[0].lhs.symbol.name == "zmax":
                    stmt = assigns[0]
                    if OMPAtomicDirective.is_valid_atomic_statement(stmt):
                        parent = stmt.parent
                        atomic = OMPAtomicDirective()
                        atomic.children[0].addchild(stmt.detach())
                        parent.addchild(atomic)
            continue

        insert_explicit_loop_parallelism(
                invoke.schedule,
                region_directive_trans=omp_target_trans,
                loop_directive_trans=omp_loop_trans,
                # Collapse is necessary to give GPUs enough parallel items
                collapse=True
        )

    return psy
sergisiso commented 3 months ago

Aha, my bad, I was looking inside "insert_explicit_loop_parallelism", but it fails for the "psy.name == "psy_stpctl_psy" special case. You could add:

try:
    omp_loop_trans.apply(loop, options={"force": True})
    omp_target_trans.apply(loop.parent.parent
except TransformationError as err:
    print(err)

around the line that fails.

Or you can delete this if block altogether.

sergisiso commented 3 months ago

~The one in 'sbcblk.f90',~ I found a different issue in 'lbclnk.f90', is an internal issue that we will need to fix, maybe its related to https://github.com/stfc/PSyclone/issues/2027

Transforming subroutine: lbc_lnk_pt2pt_sp
  File "/home/sergi/workspace/psyclone/nemo/nemo-upstream/tests/BENCH_nvhpc_omp/BLD_SCT_PSYCLONE/psct-omp_gpu_trans.py", line 92, in trans
    normalise_loops(
  File "/home/sergi/workspace/psyclone/PSyclone/examples/nemo/scripts/utils.py", line 211, in normalise_loops
    explicit_loops.apply(assignment)
  File "/home/sergi/workspace/psyclone/PSyclone/src/psyclone/psyir/transformations/arrayassignment2loops_trans.py", line 107, in apply
    self.validate(node, options)
  File "/home/sergi/workspace/psyclone/PSyclone/src/psyclone/psyir/transformations/arrayassignment2loops_trans.py", line 348, in validate
    if not isinstance(reference.datatype, (ScalarType, ArrayType)):
                      ^^^^^^^^^^^^^^^^^^
  File "/home/sergi/workspace/psyclone/PSyclone/src/psyclone/psyir/nodes/array_reference.py", line 132, in datatype
    shape = self._get_effective_shape()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sergi/workspace/psyclone/PSyclone/src/psyclone/psyir/nodes/array_mixin.py", line 630, in _get_effective_shape
    shape.append(idx_expr._extent(idx))
                 ^^^^^^^^^^^^^^^^
AttributeError: 'Reference' object has no attribute '_extent'
arporter commented 1 month ago

Yes, it's #2027. I have a fix I'm working on in that Issue.

arporter commented 1 month ago

Also, NG-Arch is working with ORCA2-ICE (less PISCES) and that is showing more issues that we see with BENCH.