Closed arporter closed 3 months ago
The small comment 'you can try to use apeg_dl_timer' is nearly impossible :( Here what I had to do to get it work:
DL_TIMER_ROOT=/home/joerg/work/apeg-dl_timer/ make
joerg@Joerg-Surface:~/work/psyclone/tutorial/practicals/nemo/2_nemo_profiling$ git diff Makefile
diff --git a/tutorial/practicals/nemo/2_nemo_profiling/Makefile b/tutorial/practicals/nemo/2_nemo_profiling/Makefile
index 8cac6833d..1b3ed587f 100644
--- a/tutorial/practicals/nemo/2_nemo_profiling/Makefile
+++ b/tutorial/practicals/nemo/2_nemo_profiling/Makefile
@@ -45,6 +45,11 @@ PROFILE_DIR ?= ../../../../lib/profiling/simple_timing
PROFILE_LINK = -lsimple_timing
PROFILE_LIB = ${PROFILE_DIR}/libsimple_timing.a
+PROFILE_DIR ?= ./../../../../lib/profiling/dl_timer/ +PROFILE_LINK = -ldl_timer_psy +PROFILE_LIB = ${PROFILE_DIR}/libdl_timer_psy.a +DL_TIMER_LIB = ~/work/apeg-dl_timer/libdl_timer_omp.a + NAME = ./tra_adv.exe
@@ -84,8 +89,8 @@ transform: -o output_3.f90 -l output tra_adv_mod.F90
compile: transform $(KERNELS) output.o solutions/runner.o
-L$(PROFILE_DIR) $(PROFILE_LINK) $(DL_TIMER_LIB)
solutions/%.o: solutions/%.f90
Compile with additional F90FLAGS (which were added to the link line in the previous change):
make F90FLAGS=-fopenmp run -j 2
OpenMP is required if apeg_dl_timer is compiled with openmp, even if no openmp is applied by a transformation
My suggested solution: we add a second Makefile that has all the changes applied. The two cases of simple_timing and dl_timer are quite different, since simple_timer does not need to link in an existing library :(
Can't you change PROFILE_LINK?
If I build dl_timer in MPI mode and then set:
PROFILE_DIR ?= ../../../../lib/profiling/dl_timer
PROFILE_LIB = ${PROFILE_DIR}/libdl_timer_psy.a
PROFILE_LINK = ${PROFILE_LIB} ~/Projects/dl_timer/libdl_timer_mpi.a
and export F90=mpif90
then it builds for me. We need to add comments/and or a new variable that holds the location of the profiling library as well as the wrapper library.
PROFILE_LINK is for the psydata wrapper. Well, I guess you could also add the options for the actual timer lib.
Error in tutorial 3:
(note use of child
in loop body)
3.1: OMPParallelLoopTrans (probably ParallelLoopTrans) validate
just says there's a CodeBlock. It would be better if it printed the contents of the CodeBlock.
3_nemo_omp/omp_trans.py already has a walk
in it but the tutorial instructions say to add one:
I guess originally we just had a loop over children
? And the text above this is wrong too (last para):
Besides shutdown, the runner MUST also call the init function for dl_timer. I.e.:
program runner
use tra_adv_mod, only: tra_adv
use profile_psy_data_mod, only: profile_PSyDataInit, profile_psydatashutdown
call profile_PSyDataInit()
call tra_adv()
call profile_psydatashutdown()
end program runner
https://bitbucket.org/apeg/dl_timer/issues/14/check-that-init-has-been-called
btw, now that the psyir starts at root and is easier to handle multiple routines easily we could split the tra_adv into multiple routines. This would allow the default "--profile routines" to be more interesting, and revert to a single file where we could insert the PSy_DataShutdown programmatically by searching only the main function. Also show InlineTrans, ACC/OMP routine/declare, HoistLocalArraysTrans, ...
btw, now that the psyir starts at root and is easier to handle multiple routines easily we could split the tra_adv into multiple routines. This would allow the default "--profile routines" to be more interesting, and revert to a single file where we could insert the PSy_DataShutdown programmatically by searching only the main function. Also show InlineTrans, ACC/OMP routine/declare, HoistLocalArraysTrans, ...
Oh yes, nice. I'll let you write that bit ;-) I think that would have to be an 'advanced' course...
Difficulties validating the OpenACC Kernels version of the tra_adv miniapp on NVIDIA GPU - does this work on Glados?
Maybe we could add the running and results comparison part of the tutorial to the integration tests.
The divergent results for example 4 seems to occur here:
This is where the first difference starts:
!$acc kernels
do jk = 1, jpk - 1, 1
zdt = 1
do jj = 2, jpj - 1, 1
do ji = 2, jpi - 1, 1
z0u = SIGN(0.5d0, pun(ji,jj,jk))
zalpha = 0.5d0 - z0u
zu = z0u - 0.5d0 * pun(ji,jj,jk) * zdt
zzwx = mydomain(ji + 1,jj,jk) + zind(ji,jj,jk) * (zu * zslpx(ji + 1,jj,jk))
zzwy = mydomain(ji,jj,jk) + zind(ji,jj,jk) * (zu * zslpx(ji,jj,jk))
zwx(ji,jj,jk) = pun(ji,jj,jk) * (zalpha * zzwx + (1. - zalpha) * zzwy)
z0v = SIGN(0.5d0, pvn(ji,jj,jk))
zalpha = 0.5d0 - z0v
zv = z0v - 0.5d0 * pvn(ji,jj,jk) * zdt
zzwx = mydomain(ji,jj + 1,jk) + zind(ji,jj,jk) * (zv * zslpy(ji,jj + 1,jk))
zzwy = mydomain(ji,jj,jk) + zind(ji,jj,jk) * (zv * zslpy(ji,jj,jk))
zwy(ji,jj,jk) = pvn(ji,jj,jk) * (zalpha * zzwx + (1.d0 - zalpha) * zzwy)
enddo
enddo
enddo
!$acc end kernels
I've not tested it here but if it diverges it probably is here for them. Its not clear to me why - there's no ambiguity of operations and no race and no division or other complex instructions (eg sqrt) that should cause it to diverge I think
There is an assignment to a scalar (zdt) which rings a bell for me.
Expose ability to run backend checks to user/script?
I think in theory it is accesible - validate_global_constraints (and public) is there but you'd have to track down the node in some way (and we no longer return the result of transformations)
4.4 neglects the need to create a mapping when creating a new script:
There's another error in the same section:
We should do a psyir.walk(Loop)
rather than look at children of subroutine.
Difficulties validating the OpenACC Kernels version of the tra_adv miniapp on NVIDIA GPU - does this work on Glados?
For the record, we had to apply the -Mnofma
flag on both the CPU and GPU builds with the NVIDIA compiler. (From memory, I think that it's the CPU that's the problem.) Anyway, I've updated the README.md of part 4 of the tutorial to mention this.
The one remaining piece is now to see whether we can run part 4 of the tutorial on GPU as part of the integration tests.
A ticket to capture issues found when running the PSyclone training.