Closed arporter closed 3 weeks ago
Adi has confirmed that it's a PSyclone failure, not a compile-time failure so this is not #2601.
Generation Error: generator: specified PSyclone transformation module 'psct-omp_gpu_trans'
raised the following exception during execution...
{
File "/home/dn-sad1/rds/hpc-work/ngarch/nemo/cfgs/BENCH_ST/BLD_SCT_PSYCLONE/psct-omp_gpu_trans.py", line 123, in trans
omp_loop_trans.apply(loop, options={"force": True})
File "/home/dn-sad1/.local/lib/python3.11/site-packages/psyclone/psyir/transformations/omp_loop_trans.py", line 265, in apply
super().apply(node, options)
File "/home/dn-sad1/.local/lib/python3.11/site-packages/psyclone/psyir/transformations/parallel_loop_trans.py", line 195, in apply
self.validate(node, options=options)
File "/home/dn-sad1/.local/lib/python3.11/site-packages/psyclone/psyir/transformations/parallel_loop_trans.py", line 108, in validate
super().validate(node, options=options)
File "/home/dn-sad1/.local/lib/python3.11/site-packages/psyclone/psyir/transformations/loop_trans.py", line 113, in validate
raise TransformationError(
psyclone.psyir.transformations.transformation_error.TransformationError: Transformation Error: Nodes of type 'CodeBlock' cannot be enclosed
I see that we're "using the force" in the apply call so perhaps the option isn't being carried through?
Can we see the script, the error looks fine, it just says it can not parallelise something with a CodeBlock. The script should try/except it and ignore it.
I think (from yesterday) that Adi is just using the standard OMP offload script from the examples directory.
But that one has a try/except for Transformation errors, it should be fine with this error on this apply. Also it only uses the "force" option for very specific subroutine names, but in the backtrace it's hardcoded "omp_loop_trans.apply(loop, options={"force": True})" which is not what we do
Hi @sergisiso @arporter here's the pscy-omp_gpu_trans.py
#!/usr/bin/env python
# -----------------------------------------------------------------------------
# BSD 3-Clause License
#
# Copyright (c) 2021-2024, Science and Technology Facilities Council.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# * Neither the name of the copyright holder nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
# FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
# COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
# INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
# BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
# ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
# -----------------------------------------------------------------------------
# Authors: S. Siso, STFC Daresbury Lab
''' PSyclone transformation script showing the introduction of OpenMP for GPU
directives into Nemo code. '''
from psyclone.psyGen import TransInfo
from psyclone.psyir.nodes import (
Loop, Directive, Assignment, OMPAtomicDirective)
from psyclone.psyir.transformations import OMPTargetTrans
from psyclone.transformations import OMPDeclareTargetTrans, TransformationError
from utils import insert_explicit_loop_parallelism, normalise_loops, \
enhance_tree_information, add_profiling
PROFILING_ENABLED = False
def trans(psy):
''' Add OpenMP Target and Loop directives to all loops, including the
implicit ones, to parallelise the code and execute it in an acceleration
device.
:param psy: the PSy object which this script will transform.
:type psy: :py:class:`psyclone.psyGen.PSy`
:returns: the transformed PSy object.
:rtype: :py:class:`psyclone.psyGen.PSy`
'''
omp_target_trans = OMPTargetTrans()
omp_loop_trans = TransInfo().get_trans_name('OMPLoopTrans')
omp_loop_trans.omp_directive = "loop"
print(f"Invokes found in {psy.name}:")
for invoke in psy.invokes.invoke_list:
print(invoke.name)
if PROFILING_ENABLED:
add_profiling(invoke.schedule.children)
# TODO #2317: Has structure accesses that can not be offloaded and has
# a problematic range to loop expansion of (1:1)
if psy.name.startswith("psy_obs_"):
print("Skipping", invoke.name)
continue
# This are functions with scalar bodies, we don't want to parallelise
# them, but we could:
# - Inine them
# - Annotate them with 'omp declare target' and allow to call from gpus
if invoke.name in ("q_sat", "sbc_dcy", "gamma_moist", "cd_neutral_10m",
"psi_h", "psi_m"):
print("Skipping", invoke.name)
continue
enhance_tree_information(invoke.schedule)
normalise_loops(
invoke.schedule,
hoist_local_arrays=True,
convert_array_notation=True,
loopify_array_intrinsics=True,
convert_range_loops=True,
hoist_expressions=True
)
# For performance in lib_fortran, mark serial routines as GPU-enabled
if psy.name == "psy_lib_fortran_psy":
if not invoke.schedule.walk(Loop):
try:
# We need the 'force' option.
# SIGN_ARRAY_1D has a CodeBlock because of a WHERE without
# array notation. (TODO #717)
OMPDeclareTargetTrans().apply(invoke.schedule,
options={"force": True})
except TransformationError as err:
print(err)
# For now this is a special case for stpctl.f90 because it forces
# loops to parallelise without many safety checks
# TODO #2446: This needs to be generalised and probably be done
# from inside the loop transformation when the race condition data
# dependency is found.
if psy.name == "psy_stpctl_psy":
for loop in invoke.schedule.walk(Loop):
# Skip if an outer loop is already parallelised
if loop.ancestor(Directive):
continue
omp_loop_trans.apply(loop, options={"force": True})
omp_target_trans.apply(loop.parent.parent)
assigns = loop.walk(Assignment)
if len(assigns) == 1 and assigns[0].lhs.symbol.name == "zmax":
stmt = assigns[0]
if OMPAtomicDirective.is_valid_atomic_statement(stmt):
parent = stmt.parent
atomic = OMPAtomicDirective()
atomic.children[0].addchild(stmt.detach())
parent.addchild(atomic)
continue
insert_explicit_loop_parallelism(
invoke.schedule,
region_directive_trans=omp_target_trans,
loop_directive_trans=omp_loop_trans,
# Collapse is necessary to give GPUs enough parallel items
collapse=True
)
return psy
Aha, my bad, I was looking inside "insert_explicit_loop_parallelism", but it fails for the "psy.name == "psy_stpctl_psy" special case. You could add:
try:
omp_loop_trans.apply(loop, options={"force": True})
omp_target_trans.apply(loop.parent.parent
except TransformationError as err:
print(err)
around the line that fails.
Or you can delete this if block altogether.
~The one in 'sbcblk.f90',~ I found a different issue in 'lbclnk.f90', is an internal issue that we will need to fix, maybe its related to https://github.com/stfc/PSyclone/issues/2027
Transforming subroutine: lbc_lnk_pt2pt_sp
File "/home/sergi/workspace/psyclone/nemo/nemo-upstream/tests/BENCH_nvhpc_omp/BLD_SCT_PSYCLONE/psct-omp_gpu_trans.py", line 92, in trans
normalise_loops(
File "/home/sergi/workspace/psyclone/PSyclone/examples/nemo/scripts/utils.py", line 211, in normalise_loops
explicit_loops.apply(assignment)
File "/home/sergi/workspace/psyclone/PSyclone/src/psyclone/psyir/transformations/arrayassignment2loops_trans.py", line 107, in apply
self.validate(node, options)
File "/home/sergi/workspace/psyclone/PSyclone/src/psyclone/psyir/transformations/arrayassignment2loops_trans.py", line 348, in validate
if not isinstance(reference.datatype, (ScalarType, ArrayType)):
^^^^^^^^^^^^^^^^^^
File "/home/sergi/workspace/psyclone/PSyclone/src/psyclone/psyir/nodes/array_reference.py", line 132, in datatype
shape = self._get_effective_shape()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sergi/workspace/psyclone/PSyclone/src/psyclone/psyir/nodes/array_mixin.py", line 630, in _get_effective_shape
shape.append(idx_expr._extent(idx))
^^^^^^^^^^^^^^^^
AttributeError: 'Reference' object has no attribute '_extent'
Yes, it's #2027. I have a fix I'm working on in that Issue.
Also, NG-Arch is working with ORCA2-ICE (less PISCES) and that is showing more issues that we see with BENCH.
Currently, we only do passthrough for NEMOv5 in the integration tests. We need to add a GPU build and start monitoring performance.
Adi (Bristol) reports that he had to exclude the following files:
Possibly, stpctl.f90 is #2601 but I will check.