Open LonelyCat124 opened 5 months ago
Makes sense, note that the hoisting happens in the script call to normalise, so you can turn it off:
normalise_loops(
invoke.schedule,
hoist_local_arrays=True,
convert_array_notation=True,
loopify_array_intrinsics=True,
convert_range_loops=True,
hoist_expressions=False
)
or make it subroutine specific with hoist_expressions=False if invoke.name = "mysubroutine" else True
The current issue for making the hoisting aware of collapse, it that the collapse logic still lives outside src
(in examples/nemo/utils.py
). This was temporal, at some point I wanted to bring it somewhere inside source. At least as a transformation option for the omp transformations, but maybe even in a more general place (collapse if basically and extension of proving the iteration independence but for more than one contiguous loops). Then we can have an option in the hoisting transformation to enable or disable hoisting this statements.
Oh and btw the OpenMP standard does not specify that the collapsed loops have to be perfectly nested, this was usually the case but some latest versions of compilers support simple statements in the middle of the loops like the one you shown, but I unsure if we should let this happen because still some compilers will fail.
re: collapse + compilers maybe we should test with: Recent-ish gcc (9 or 10?) Recent-ish intel (2021? 2022?) Recent-ish Nvidia Recent-ish cray (if we have access somewhere?) MO current preferred compiler(s)
If those all accept the new collapse then I think we're probably ok to move towards the 5.1 standard with how we handle collapse.
I'll try to check that file in detail and check if there's any other things that should be hoisted, but if not I'll try disabling hoist for that file.
I'm looking at a section of the newer socrates code, where there is a loop:
I wanted to (hopefully) collapse the
k_inner
andj
loops, so I refactored the code as follows (and removed the iex_major writes to a separate loop):This should result in the same behaviour, not contain any race conditions etc.
However, the default script (that should have good general behaviour) prevents this as it hoists this statement (since its loop independent of the outer loop) up to the outer loop, preventing collapse:
I think for GPUs we should be more careful with this hoisting behaviour, and use the collapse logic as well, e.g.
@sergisiso @arporter what do you think? Would this be a reasonable general rule to use instead of the current one?