stfc / PSyclone

Domain-specific compiler and code transformation system for Finite Difference/Volume/Element Earth-system models in Fortran
BSD 3-Clause "New" or "Revised" License
104 stars 28 forks source link

PSyclone reordering symbol table during output resulting in compiler error #2202

Open LonelyCat124 opened 1 year ago

LonelyCat124 commented 1 year ago

When accessing (at least) one file in Socrates PSyclone changes the order of variable declaration, which is resulting in compiler error due to undefined variable.

The file in question is gas_list_pcf.f90, at some point there is a variable defined as: INTEGER, PRIVATE :: i

later in the file during the various reshape calls, this i variable is used.

IN the file output by PSyclone, the outputs are reordered such that some accesses to i are used before its definition:

&public :: threshold_wavelength = RESHAPE([REAL(KIND = RealK) :: 246.0E-09_RealK, 242.0E-09_RealK, 175.0E-09_RealK, &
&134.0E-09_RealK, 129.0E-09_RealK, (0.0_RealK, i = 1, npd_products - 5), 227.5E-09_RealK, 167.1E-09_RealK, 128.6E-09_RealK, &
&108.2E-09_RealK, 89.922E-09_RealK, 65.026E-09_RealK, 63.693E-09_RealK, 54.655E-09_RealK, (0.0_RealK, i = 1, npd_products - 8), &
..... rest of this definition
... string array definition
integer, private :: i

This happens in some other files too, e.g. refract_re_ccf.f90

arporter commented 1 year ago

Strictly I think this is still valid Fortran (which compiler is this?) but we do already do the 'right thing' for parameter declarations. Hopefully that approach can be extended to the other declarations.

LonelyCat124 commented 1 year ago

I'm using mpifort, wrapping gcc-12.2 on glados.

rupertford commented 1 year ago

@sergisiso's suggestion to replace decl x = expr with decl x\nx = expr in PSyIR would also fix this particular case. Determining a valid order will be complicated by the fact that we will have UnknownFortranType's in some cases (including this one) and we will not know their dependencies.

LonelyCat124 commented 1 year ago

Ok, I think I tracked this down a bit further and this specific case is due to using a non-parameter decalration in the declaration of a parameter.

A testcase that outputs code that won't compile on godbolt:

def test_fail_case(fortran_reader, fortran_writer):
    code = '''
    module mymod
        integer, parameter :: x = 1
        integer, parameter :: j = 9
        integer, parameter :: k = 12
        integer :: i
        real, parameter :: threshold_Wavelength(j, k) &
        = reshape( [REAL :: &
        1.3, 1.5, 1.6, 1.7, &
        (0.0, i=1, k-4), &
        1.2, 1.4, 1.2, &
        (0.0, i=1, k-3), &
        (0.0, i=1, k), &
        (0.0, i=1, k), &
        (0.0, i=1, k), &
        (0.0, i=1, k), &
        (0.0, i=1, k), &
        (0.0, i=1, k), &
        (0.0, i=1, k)], shape=[j, k] )
    end module
'''
    psyir = fortran_reader.psyir_from_source(code)
    pass_through = fortran_writer(psyir)
    print(pass_through)
    assert False

The output code here is (with me adding line truncation):

module mymod
  implicit none
  integer, parameter, public :: x = 1
  integer, parameter, public :: j = 9
  integer, parameter, public :: k = 12
  real, dimension(j,k), parameter, public :: threshold_wavelength = RESHAPE([REAL :: 1.3, 1.5, 1.6, 1.7, &
   (0.0, i = 1, k - 4), 1.2, 1.4, 1.2, (0.0, i = 1, k - 3), (0.0, i = 1, k), (0.0, i = 1, k), (0.0, i = 1, k),&
   (0.0, i = 1, k), (0.0, i = 1, k), (0.0, i = 1, k), (0.0, i = 1, k)], shape = [j, k])
  integer, public :: i
  public

  contains

end module mymod

gfortran trunk on godbolt sees an error here:

    7 |    (0.0, i = 1, k - 4), 1.2, 1.4, 1.2, (0.0, i = 1, k - 3), (0.0, i = 1, k), (0.0, i = 1, k), (0.0, i = 1, k),&
      |          1
Error: Symbol 'i' at (1) has no IMPLICIT type
Compiler returned: 1

If you move the declaration of i to before threshold_wavelength as the original input does then the output will compile.

@sergisiso I think PSyclone attempting to output all parameters first is the cause of the error for this issue.

sergisiso commented 1 year ago

We assumed that any parameter (compile-time) value could only depend on other parameter values (compile-time) .

But here it uses a runtime (i) variable (albeit with a runtime given value) to set-up a compile-time value(threshold_wavelength) ? Or is the 'i' inside the reshape something specific to shape and not related to the outside 'i'? I am not familiar with RESHAPE

sergisiso commented 1 year ago

I guess its related because you mention:

If you move the declaration of i to before threshold_wavelength as the original input does then the output will compile.

arporter commented 1 year ago

Victoria (MO) has hit a related problem with our re-ordering of declarations. NEMOVAR has code that defines a type and declares an abstract interface that uses that type. Unfortunately, when we re-generate the declarations, we put the type declaration after the interface declaration and the compiler is unhappy. (Note that the type declaration is of UnknownFortranType because it contains PROCEDURE elements.) I think the only solution to this is to examine the parse trees of the two declarations and check for dependencies?

arporter commented 1 year ago

The code in question looks like:

TYPE ftam_typ

  TYPE(ftam_ctl_typ) :: &
     & fctl

  ! reference trajectory

  TYPE(ftrj_typ) :: &
     & ftrj

  ! before state

  TYPE(foce_typ) :: &
     & foce_b

  ! increment

  TYPE(finc_typ) :: &
     & finc_tl, &
     & finc_ad

  ! testing

  INTEGER :: &
     & itst_adj, & !:  test adjoint modules (1), stp_adj (2) or both (3)
     & itst_tan    !:  test tangent modules (1), stp_tan (2) or both (3)

  ! Time stepping

  INTEGER :: &
     & istart, & !: Fisrt and
     & iend      !: last time steps

  PROCEDURE(rst_interface)  , POINTER :: initialiseTL
  PROCEDURE(rst_interface)  , POINTER :: initialiseAD
  PROCEDURE(model_interface), POINTER :: stepTL
  PROCEDURE(model_interface), POINTER :: stepAD
  PROCEDURE(fin_interface)  , POINTER :: finaliseTL
  PROCEDURE(fin_interface)  , POINTER :: finaliseAD

CONTAINS
   PROCEDURE :: setup    => tam_setup
   PROCEDURE :: del      => tam_del
   PROCEDURE :: adj_test => tam_adj_test

END TYPE ftam_typ
ABSTRACT INTERFACE

   SUBROUTINE rst_interface( &
     &  self, &
     &  piom_ctl, &
     &  pvar_ctl, &
     &  pvars, &
     &  pinc, &
     &  poce )

     IMPORT :: fiom_ctl_typ, fvar_ctl_typ, finc_typ, foce_typ, ftam_typ, fvariables_typ

     CLASS(ftam_typ), INTENT(INOUT) :: &
        & self
     ...
 END INTERFACE
LonelyCat124 commented 1 year ago

We assumed that any parameter (compile-time) value could only depend on other parameter values (compile-time) .

But here it uses a runtime (i) variable (albeit with a runtime given value) to set-up a compile-time value(threshold_wavelength) ? Or is the 'i' inside the reshape something specific to shape and not related to the outside 'i'? I am not familiar with RESHAPE

I think what happens is reshape turns the 1D "list of numbers" (I'm unclear if its an array or what it is) in the arguments into the shape'd array defined as the shape argument. This can be used to define an array inline, similar to you could in C with {} syntax.

The use of i then seems to be something to do with how you're allowed to define this list of numbers and I don't know enough about Fortran to understand this in detail.

One possible solution could be to output in a different order:

  1. All variables without values set.
  2. Parameters
  3. Other variables as currently defined.

This would avoid the problem, but I don't know if it would make any others? It wouldn't help Andy/Victoria's issue though.

Mini code:

program x
use, intrinsic :: iso_fortran_env, only : real64
integer, parameter :: npd_products = 3
integer, parameter :: npd_gases = 3
integer :: i

real(real64), dimension(npd_products,npd_gases), parameter &
 :: threshold_wavelength = RESHAPE([REAL(real64) :: 1.5, (0.0, i=1, npd_products-1), &
 1.5, 1.4, 1.3, (0.0, i=1, npd_products)], &
shape = [npd_products, npd_gases])

print *, (threshold_wavelength(1,2))
end program

Creates an array [[1.5, 1.5, 0.0], [0.0, 1.4, 0.0], [0.0, 1.3, 0.0]]

sergisiso commented 1 year ago

I'm unclear if its an array or what it is

I was also confused because arrays are [] or (/ /) but not only parenthesis.

Apparently it is an implied do: https://pages.mtu.edu/~shene/COURSES/cs201/NOTES/chap08/io.html

which acts like a range with the second and third element are the bounds.

And the middle element is even called "data-i-do-object" in the Fortran standard, this is why I wasn't sure if it is related to the variable "i" (but it does, you can change the name but must be declared before)

sergisiso commented 1 year ago

But what it is really doesn't matter. It should end up in the initial_value as a CodeBlock if not supported.

Indeed we could do your proposed vars without initial_value first or full dependency ordering, both would fix it. But we need #1419 first otherwise it ends as part of a UnknownFortranType.

Once that is implemented we can reason about initial_value of the symbol regardless of it being an UnknownType, CodeBlock or have all info.

LonelyCat124 commented 1 year ago

Will #1419 fix this case for a full dependency ordering? We will know that the LHS is a Real kind, but since the initial value is still a CodeBlock we won't know what inside it for working out dependencies?

And we could always end up with a (probably) unsolvable case without understanding how to parse this new object, e.g. two arrays defined in this manner where one is used to initialise some values of the other, i.e. inside the second array (array1(i), i=9, 13). Adding an implied do into PSyclone to handle this case might be useful if this is something we expect to see in code for LFRIC? Maybe worth asking MO.

I think handling reshape in general might be worthwhile as I think its the only way to initialise a multidimensional array inline - at least I can't work out how to do it with either [] or (/ syntax.

Edit: To add reshape we probably need:

  1. Add reshape into intrinsic_handler in fparser2.py
  2. Add reshape into IntrinsicCall node, with 2 required and 2 optional arguments. Types of the required arguments i'm not sure, does PSyclone have an equivalent to FParser's Array_Constructor node? I think it doesn't and this was part of the issue from 1419, we'd need some sort of inline array declaration for these arguments.
sergisiso commented 1 year ago

We do because CodeBlock has a "get_symbol_names" methods that return all internal symbols, thanks to all being captured as Fortran2003.Name (unless there is also fparser issues). For the dependency analysis to pick this up we may need to specialise the reference_access method for CodeBlocks but this is an implementation detail. I think we can do dependency analysis with CodeBlocks (we just need to assume always worst case "READWRITE" because we don't understand the semantics of the code in it)

e.g. two arrays defined in this manner where one is used to initialise some values of the other

Ok maybe not two codeblocks with dependencies between them :)

LonelyCat124 commented 1 year ago

Ah ok, that makes sense. I think the extreme case we can just hope to never come across, it doesn't seem like a very sane use-case.

sergisiso commented 1 year ago

A couple more points related to this:

rupertford commented 1 year ago

I think we will still need to re-order in the backend in general. This is because 1) Fortran does not require something to be declared before it is used in another declaration, but some compilers might not support it, 2) someone creating or modifying PSyIR might not add symbols to the symbol table in a 'declare first' order and 3) backends to other languages might have a 'declare first' constraint.

sergisiso commented 1 year ago

1419 is now fixed. It does not solve ordering issues but it means that all initialization expressions are now accessible in symbol.initial_value (even for unknown types) and therefore we have everything to tackle this issue.