sourceryinstitute / OpenCoarrays

A parallel application binary interface for Fortran 2018 compilers.
http://www.opencoarrays.org
BSD 3-Clause "New" or "Revised" License
245 stars 58 forks source link

Incorrect shape of coindexed multidimensional array component #511

Closed rouson closed 4 years ago

rouson commented 6 years ago
Avg response time
Issue Stats

A future pull request will add a unit test that exposes this bug in a more complete way than the small reproducer below.

Defect/Bug Report

When compiled with GCC 6.4, 7.3, and 8.0.1, OpenCoarrays returns the incorrect shape of a coindexed variable even in single-image execution.

Observed Behavior

$ cat wrong-coarray-shape.f90
program main
  implicit none
  type foo
    logical, allocatable :: x(:,:)[:]
  end type
  type(foo) :: bar
  allocate(bar%x(2,1)[*])
  print *,shape(bar%x) , shape(bar%x(:,:)[1])
end program
$ caf wrong-coarray-shape.f90 
$ cafrun -n 1 ./a.out
           2           1           2           0

Expected Behavior

$ cafrun -n 1 ./a.out
           2           1           2           1

Steps to Reproduce

To reproduce this problem, the shape argument must

This appears to be an OpenCoarrays bug and is unrelated to the compiler's shape intrinsic function. Any reference to a coindexed variable yields an array with incorrect extents. For example, if an array that meets the above criteria is assigned to a non-coarray allocatable array, the latter array acquires the wrong shape through automatic (re)allocation.

CONTRIBUTING.md

rouson commented 6 years ago

@gutmann I'm tagging you so you'll get updates on this issue. Notice that this issue occurs with all versions tested, including 6.4.0. Either the error creeped in after 6.3.0 or we luckily circumvented it or we had an undetected, silent failure.

rouson commented 6 years ago

@scrasmussen The send-get/alloc_comp_multidim_shape.F90 unit test provides a more comprehensive test of the feature required to close this issue.

scrasmussen commented 6 years ago

Just an update on where I am, I think shape isn't working because there's an issue with the array indexing, get_data in mpi_caf.c is probably fetching the wrong memory (seems to just be slightly off). With shape and indexing in the following example, it worked with bar%x but breaks with bar%x(:,:)[i]. In the example I tried to show the three different behaviors I was getting, non-deterministic numbers returned from indexing, seg fault, and infinite printing; all which point to array indexing going out of bounds, probably off by one.

@vehre were you seeing any strange array indexing behavior with your fixes?

Anyway I'll work on the indexing but wanted to give an update because this bug might pop up in other issues.

OpenCoarrays Version: 3d485eafdeffe35e99717211feb0441ad033b312 Fortran Compiler: GCC with gcc-8-branch version: 8.1.1 20180507 MPI library being used: MPICH 3.3b1

Compiled and ran the following program with

caf -g -O0 index-bug.F90 -o runMe.exe
cafrun -np 1 ./runMe.exe
program main
  implicit none
  type foo
    integer, allocatable :: x(:,:)[:]
  end type
  integer, allocatable :: air(:,:)
  type(foo) :: bar
  logical :: infinite_print, seg_fault

  allocate(bar%x(2,1)[*])
  allocate(air(2,1))

  if (this_image() == 1) then
    bar%x(1,1)[1] = 4
    bar%x(2,1)[1] = 7
  end if
  sync all

  infinite_print = .FALSE. !.TRUE.                                                                                             
  seg_fault      = .FALSE. ! .TRUE.                                                                                            
  if (infinite_print) then
    print* , "==========="
    print *,this_image(), "has", bar%x(:,:)[1]
  else if (seg_fault) then
    !! comment seg_fault to .TRUE., infitite_print to .FALSE.  and uncomment to get seg fault                                  
    !! NEXT TWO LINES COMMENTED OUT TO GET INFINITE LOOP                                                                       
    ! air = bar%x(:,:)[1]                                                                                                      
    ! print *,this_image(), "has", bar%x(:,:)[1]                                                                               
  else  ! NON DETERMINISTIC VALUE IN bar%x(2,1)                                                                                
    air = bar%x(:,:)[1]
    print *,this_image(), "has", air(1,1), air(2,1)
  end if
end program
zbeekman commented 6 years ago

@scrasmussen are you building using OpenCoarrays from https://github.com/sourceryinstitute/OpenCoarrays/pull/528/commits/3d485eafdeffe35e99717211feb0441ad033b312 or from https://github.com/sourceryinstitute/OpenCoarrays/pull/531/commits/7d6d24ff30d0a3fa20ccc30e105865ca99d949dd ? Master will not work with GCC >= 8 (at least not until we merge Andre's PR, but we need to clean it up to work with GFortran 7.1 - 7.3 first.

scrasmussen commented 6 years ago

@zbeekman yeah sorry about the misleading info, I'm using the 3d485eafdeffe35e99717211feb0441ad033b312 commit, I just put the wrong one in my previous message

rouson commented 6 years ago

This [alloc_comp_multidim_shape] test now passes with a patched GCC 8.1.0. Wow! Great work, @scrasmussen. I'm closing this issue.

@gutmann This fixes one issue that was blocking Coarray ICAR, but my tests with a patched GCC 8.1.0 lead to a runtime error in the OpenCoarrays send_by_ref function so we should attempt to isolate the remaining issue. I'll tag you when I reopen a related issue that I just closed. There have been a number of improvements to send_by_ref lately so hopefully the issue is not too difficult to find and fix.

zbeekman commented 6 years ago

We may also have @neok-m4700 to thank in PR #531. I know he has been trouble shooting a lot of cobounds/codim issues recently, for which we are very grateful! Props to @scrasmussen too for all of his great work!

scrasmussen commented 6 years ago

Credit due where credit deserved, @vehre's changes fixed the [alloc_comp_multidim_shape]. Thanks for that!

I'm reopening this issue since for any allocate(bar%x(N,1)[*]) such that N > 1, it gives the wrong answer. It's having an issue if there is a dimension of size 1 and any proceeding dimensions are greater than 1. I'll continue to look into this issue.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

rouson commented 4 years ago

@afanfa I just put code online here demonstrating what we ultimate need to work once this issue gets fixed. If the code executes correctly, it prints "Test passed." Currently, Intel 18 compiler compiles the code correctly.

$ ifort -coarray=shared -coarray-num-images=8 intel-18-works.f90 
$ ./a.out
 Test passed
$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 18.0.3.222 Build 20180410
Copyright (C) 1985-2018 Intel Corporation.  All rights reserved.

Sadly, this code generates an internal compiler error in gfortran 9.2.0, which means there's definitely a compiler bug. However, I can probably write a version that's not too different that at least compiles.

afanfa commented 4 years ago

The PR made by @neok-m4700 is not enough to fix this problem. In fact, the test code provided by @rouson generates an internal compiler error with the current gcc-trunk (10.0.1).