stfc / PSyclone

Domain-specific compiler and code transformation system for Finite Difference/Volume/Element Earth-system models in Fortran
BSD 3-Clause "New" or "Revised" License
102 stars 27 forks source link

GPU Hackathon #244

Closed arporter closed 5 years ago

arporter commented 5 years ago

This is a catch-all issue to give me somewhere to capture work done ready for the MO hackathon. This work may or may not then be broken out into smaller PRs.

arporter commented 5 years ago

Now that we 'handle' DO WHILE, the next problem is implicit loops that contain function calls, e.g.:

 z3d(1,:,:) =  ptr_sjk( pvtr(:,:,:), btmsk(:,:,jn)*btm30(:,:) ) 

Here, ptr_sjk is an array-valued function (that performs zonal averaging). It takes in a 3D field and a 2D mask and returns a 2D result that is stored in z3d(1,:,:).

arporter commented 5 years ago

Is the key to spotting this that the assignment is to a single entity (ptr_sjk) or that there are array-syntax accesses within the "index" expressions of ptr_sjk?

arporter commented 5 years ago

I've gone for the latter option. Next problem is code that uses array syntax in different indices of arrays. Since this indicates we can't safely assume what the array bounds are I've disallowed it for the moment. With more work we could (and should) work out what the array bounds should be.

arporter commented 5 years ago

Next problem is a utf8-decoding problem with some source files, e.g.:

$ psyclone -api "nemo" dynadv_ubs.f90 
Error, unexpected exception, please report to the authors:
Description ...
'utf-8' codec can't decode byte 0xca in position 13362: invalid continuation byte
...
  File "/home/kbc59144/Projects/GungHo/PSyclone/external/fparser/src/fparser/common/sourceinfo.py", line 311, in get_source_info
   return get_source_info_str(file_object.read())
  File "/home/kbc59144/MyInstalls/python_envs/py3/lib64/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
arporter commented 5 years ago

This utf8 problem is because that file has a comment that reads:

! Vertical volume fluxesÊ

i.e. a mistake in the source! Actually, I think this is permitted in the standard because it is in a comment (and obviously it gets through most compilers).

arporter commented 5 years ago

Next step is to get the PSyclone'd code to compile. Currently it won't because of duplicated loop variable declarations.

arporter commented 5 years ago

Removed the internal error we raised when a subroutine has no specification part and also ensured that NemoImplicitLoop.match() rejects implicit loops for arrays of rank 1. icbdia.F90 is now parsed and re-constructed without raising any exceptions. However, it contains an array of rank 3 that we assume is dimensioned (jpi,jpj,jpk) but is in fact (jpi,jpj,nclasses). I think I therefore need to back off from my 'eager' conversion of implicit loops and have it as an option. This will enable us to get something working and then we can tackle enabling it as required. On the bright side, PSyclone does now run over the whole of the source code for the AMM12 configuration without any 'unexpected' errors.

arporter commented 5 years ago

I've cut-n-pasted the whole of the code that did the 'eager' conversion of implicit loops into a new NemoExplicitLoopTrans transformation. I'll need to modify the existing tests so that they use this transformation rather than expecting it to be done by default.

arporter commented 5 years ago

In order to workaround the problems with INCLUDE files, I've hacked fparser to turn them into a comment and then I've modified the PSy output of PSyclone to turn any such comments back into normal INCLUDE statements.

arporter commented 5 years ago

I can now add the kernels directive to a source file. However, to test this I need to be able to build NEMO with the PGI compiler. I've built netcdf on glados with PGI but am getting errors when trying to compile HDF5.

arporter commented 5 years ago

Turns out that I don't need HDF5 and can successfully build and run without it. Maybe it's only needed by XIOS? My first attempt at just whacking in the 'kernels' directive (to traldf_iso) failed because that routine calls a number of others (e.g. halo exchange, IO) and those have not been marked-up with an ACC ROUTINE directive. I think the simplest solution is to be a bit more refined and mark-up each loop separately.

arporter commented 5 years ago

By using recursion I can maximise the number of nodes that are included in each KERNELS region. However, despite the following PSyIRe looking absolutely fine:

    If[iom_use("udiff_heattr") .OR. iom_use("vdiff_heattr")]
        If[cdtype == 'TRA' .AND. jn == jp_tem]
            Directive[ACC Kernels]
                Loop[type='None',field_space='None',it_space='None']
                Loop[type='levels',field_space='None',it_space='None']
                    Loop[type='lat',field_space='None',it_space='None']
                        Loop[type='lon',field_space='None',it_space='None']
                            KernCall[Explicit]
                Loop[type='None',field_space='None',it_space='None']
            NemoCodeBlock[<class 'fparser.two.Fortran2003.Call_Stmt'>]
            Directive[ACC Kernels]
                Loop[type='None',field_space='None',it_space='None']
                Loop[type='levels',field_space='None',it_space='None']
                    Loop[type='lat',field_space='None',it_space='None']
                        Loop[type='lon',field_space='None',it_space='None']
                            KernCall[Explicit]
                Loop[type='None',field_space='None',it_space='None']

the generated code contains:

      IF (cdtype == 'TRA' .AND. jn == jp_tem) THEN
      !$ACC KERNELS
      !$ACC KERNELS
      z2d(:, :) = 0._wp
      DO jk = 1, jpkm1
        ...
      z2d(:, :) = - rau0_rcp * z2d(:, :)
      !$ACC END KERNELS
      !$ACC END KERNELS
      ! note sign is reversed to give down-gradient diffusive transports (#1043)
      CALL lbc_lnk(z2d, 'U', - 1.)
      CALL iom_put("udiff_heattr", z2d)
      ! heat transport in i-direction
      !
      z2d(:, :) = 0._wp
      ...

I think this must be related to the problem that @rupertford found. I shall investigate...

rupertford commented 5 years ago

Yes, it looks to be the same bug. If the content is the same z2d(:, :) = 0._wp then it always finds the first one.

arporter commented 5 years ago

Although I now exclude subroutine calls from KERNEL regions, they are still contained within DATA regions. This is a problem if such calls access fields that are on the device, e.g. halo-swaps.

arporter commented 5 years ago

Currently the data region is not generating correct code for the deep-copying of derived types, e.g.:

!$ACC DATA COPYIN(dta_read,ptr,floor) COPYOUT(dta)
!
  IF (map % ll_unstruc) THEN
  !$ACC KERNELS DEFAULT(PRESENT)
  ! unstructured open boundary data file
    DO ib = 1, ipi
    DO ik = 1, ipk
      dta(ib, 1, ik) = dta_read(map % ptr(ib), 1, ik)
    END DO
  END DO
  !$ACC END KERNELS

Here, we should have COPYIN(dta_read, map, map%ptr). (Note that floor is not being recognised as an intrinsic function but that is easily fixed.)

arporter commented 5 years ago

glob_sum is a function defined in lib_fortran.f90 that does what it says (using MPI). However, we currently identify it as an array rather than a function.

arporter commented 5 years ago

Ultimately, we will need to use enter data directives in order to keep the data on the device between calls to any given subroutine. This then means we'll need to mark-up any e.g. halo-swap calls with update host and update device directives.

arporter commented 5 years ago

@rupertford this is the Issue where I'm capturing the problems thrown up by compiling the generated code.

arporter commented 5 years ago

Closing this Issue as the problems identified here have been fixed and are on master.