sourceryinstitute / OpenCoarrays

A parallel application binary interface for Fortran 2018 compilers.
http://www.opencoarrays.org
BSD 3-Clause "New" or "Revised" License
247 stars 56 forks source link

Defect: coarray of derived type: crash during allocation of an allocatable member #742

Open hassaniriad opened 2 years ago

hassaniriad commented 2 years ago

The title of the issue should start with Defect: followed by a succinct title.

Please make sure to put any logs, terminal output, or code in fenced code blocks. Please also read the contributing guidelines before submitting a new issue.

Please note we will close your issue without comment if you delete, do not read or do not fill out the issue checklist below and provide ALL the requested information.

System information including:

To help us debug your issue please explain:

Dear Opencoarrays developers, consider the following minimal example (a module (mymod) and a main in separate files): cat mymod.f90

module mymod
   implicit none

   type :: my_t
      integer, allocatable :: i(:), j(:)
   end type my_t

contains

   subroutine set ( n, var )
      integer   , intent(in    ) :: n   
      type(my_t), intent(   out) :: var

      integer            :: err
      character(len=100) :: msg

      err = 0 ; msg = ''

      allocate(var%i(n), stat = err, errmsg = msg)
      if (err /= 0) error stop "in set: allocation failure (for %i): "//trim(msg)

      allocate(var%j(n), stat = err, errmsg = msg)
      if (err /= 0) error stop "in set: allocation failure (for %j): "//trim(msg)

      ! ...
      ! ...
      ! ...
   end subroutine set

end module mymod

cat main.f90

program foo
   use mymod
   implicit none

   type(my_t) :: myvar[*]       

   if (this_image() == 1) then
      ! allocate and set the members of myvar[1]:
      call set ( var = myvar, n = 5 )
   end if

   sync all
   ! ...
   ! ...
   ! ...  
   if (this_image() == 1) print*,'terminated'
end program foo

What happened (include command output, screenshots, logs, etc.)

When compiling with caf: caf mymod.f90 main.f90 -Wall -fcheck=all -fbacktrace and running with cafrun: cafrun -n 4 ./a.out an error occurs:

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x108a5120e
#1  0x108a5041d
#2  0x7fff6493fb5c

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 58989 RUNNING AT
=   EXIT CODE: 6
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Terminated: 15 (signal 15)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
Error: Command:
   `/usr/local/bin/mpiexec -n 4 ./a.out`
failed to run.

Please also note that 1) if I replace the intent(out) by intent(inout) an allocation error is reported:

ERROR STOP in set: allocation failure (for %j): Attempt to allocate an allocated object
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Error: Command:
   `/usr/local/bin/mpiexec -n 4 ./a.out`
failed to run.

2) the issue goes away when I use a single source file: cat mymod.f90 main.f90 > foo.f90 caf foo.f90 -Wall -fcheck=all -fbacktrace cafrun -n 4 ./a.out terminated

rouson commented 2 years ago

@hassaniriad thanks for submitting this. I just discovered what is likely the same issue the day before you submitted this.

@vehre could this be related to your recently merged PR?

vehre commented 2 years ago

Hi @rouson my latest merge was about static arrays. Here I see only allocatable ones. I am more intrigued to look for this issue in the module handling. But that is just a guess w/o having taken a decent look into it. When you want me to analyse and work on this, just say so.