sourceryinstitute / OpenCoarrays

A parallel application binary interface for Fortran 2018 compilers.
http://www.opencoarrays.org
BSD 3-Clause "New" or "Revised" License
244 stars 58 forks source link

Defect: Assignment of big > 2 GB coarrays fails when linked against Intel MPI #622

Open modrzejewski opened 5 years ago

modrzejewski commented 5 years ago
Avg response time
Issue Stats

Defect/Bug Report

Observed Behavior

Program stops with error message

ERROR STOP MPI-error: Invalid count

Expected Behavior

           2   2.0000000000000000     
           8   8.0000000000000000     
           4   4.0000000000000000     
           5   5.0000000000000000     
           7   7.0000000000000000     
           1   55.000000000000000     
           6   6.0000000000000000     
           9   9.0000000000000000     
           3   3.0000000000000000     
          10   10.000000000000000

Steps to Reproduce

Minimal example

program bigarrays
      implicit none
      double precision, dimension(:, :), allocatable :: x[:]
      integer, parameter :: n = 20000
      integer :: k
      allocate(x(n, n)[*])
      x = dble(this_image())
      sync all
      if (this_image() == 1) then
         do k = 2, num_images()
            x(:, :) = x(:, :) + x(:, :)[k]
         end do
      end if
      sync all
      print *, this_image(), x(1, 1)
end program bigarrays

Compilation

module load plgrid/tools/gcc/6.4.0
module load plgrid/libs/opencoarrays/2.3.1
gfortran -fcoarray=lib bigarrays.f90 -lcaf_mpi

Invoke program using 10 images (SLURM script)

#!/bin/bash -l
#SBATCH --job-name="test"
## Number of nodes
#SBATCH --nodes=10
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=24
#SBATCH --mem 100000
#SBATCH --time=0:05:00 
#SBATCH -A rpa2018
#SBATCH -p plgrid-testing
#SBATCH --output="bigarrays_gfortran.log"
#SBATCH --error="bigarrays_gfortran.log"

module load plgrid/tools/python
module load plgrid/tools/gcc/6.4.0
module load plgrid/libs/opencoarrays/2.3.1

cafrun -np 10 ./a.out
modrzejewski commented 5 years ago

The same issue is present when OpenCoarrays is linked against OpenMPI 2.1.1.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

zbeekman commented 5 years ago

sigh, I still haven't had a chance to investigate. I won't mark as "in progress" though, so that stale bot keeps bugging me about this.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

zbeekman commented 5 years ago

Be gone stale bot