sourceryinstitute / OpenCoarrays

A parallel application binary interface for Fortran 2018 compilers.
http://www.opencoarrays.org
BSD 3-Clause "New" or "Revised" License
246 stars 55 forks source link

Defect: "sync all" Doesn't Sync #693

Closed evansste12432 closed 4 years ago

evansste12432 commented 4 years ago

System information including:

To help us debug your issue please explain:

What you were trying to do (and why)

I have written a short Fortran program which uses coarrays and the sync all statement. The sync all statement doesn't appear to work as expected.

What happened (include command output, screenshots, logs, etc.)

I compiled and ran the following program with the number of processors equal to three.

program sync_issue
    implicit none

    integer :: n, noe

    noe = 3
    do n = 1,noe
        print *, 'red ', n, ' ', this_image()
        sync all
        print *, 'green ', n, ' ', this_image()
        sync all
    end do
end program sync_issue

It produces the following result on the screen:

 red            1             1
 green            1             1
 red            2             1
 green            2             1
 red            3             1
 green            3             1
 red            1             2
 green            1             2
 red            2             2
 green            2             2
 red            3             2
 green            3             2
 red            1             3
 green            1             3
 red            2             3
 green            2             3
 red            3             3
 green            3             3

What you expected to happen

I expected it to print something like the following:

red            1             1
red            1             2
red            1             3
green            1             1
green            1             2
green            1             3
red            2             2
red            2             1
red            2             3
green            2             3
green            2             2
green            2             1
red            3             1
red            3             3
red            3             2
green            3             3
green            3             1
green            3             2

Step-by-step reproduction instructions to reproduce the error/bug

Compile the short program with caf sync_issue.f08 -o sync_issue. Then run it with cafrun -np 3 sync_issue.

evansste12432 commented 4 years ago

Apparently, this isn't a bug. The problem arises due to the fact that information is buffered when using the print command.

In order to solve the problem, place the following command after your print statements: call execute_command_line(''). This will cause the buffered information to be released so that you can immediately see your printed results on the screen.

So, ultimately, my revised program now looks like this:

program sync_issue
    implicit none

    integer :: n, noe

    noe = 3
    do n = 1,noe
        print *, 'red ', n, ' ', this_image()
        call execute_command_line('')
        sync all

        print *, 'green ', n, ' ', this_image()
        call execute_command_line('')
        sync all
    end do
end program sync_issue

It now wonderfully works as expected.

shahmoradi commented 4 years ago

How about

flush(output_unit)

where output_unit is from

use iso_fortran_env, only: output_unit

instead of

call execute_command_line('')

or is the flush() statement not applicable to stdout?

evansste12432 commented 4 years ago

Thanks, shahmoradi.

flush() is applicable to stdout, and may very well work. I learned about it while trying to get this program to function properly. It's the suggestion that kept popping up, but I could never get it to work.

In all of my searches, no one mentioned the need for iso_fortran_binding, which, apparently, is why I couldn't get flush() to work.

I don't have iso_fortran_binding installed, so I still haven't been able to confirm. You're probably right that it'll work.

Once I discovered that call execute_command_line('') works, I stopped looking for other solutions.

Thanks for shedding more light on flush(). It's always helpful to understand the proper approach.

rouson commented 4 years ago

@evansste12432 iso_fortran_binding was a typo. You want

use iso_fortran_env, only : output_unit

(On a historical note, I recently heard that the Oracle Fortran compiler used to have something called iso_fortran_binding, but it was never part of the Fortran standard. I don't recall the purpose -- I think it had something to do with linking to legacy Fortran code that used some Oracle extension to the language.)

evansste12432 commented 4 years ago

Thanks for clearing that up, @rouson. In that case, flush() still didn't work.

I've been using iso_fortran_env as I try to work with teams. I'm assuming it was installed, by default, when I installed OpenCoarrays 2.8.0, since I have team functionality. Am I wrong about that?

If I'm right, then the flush() approach still doesn't seem to work.

rouson commented 4 years ago

@evansste12432 iso_fortran_env is part of the Fortran standard and is installed automatically whenever gfortran is installed. And yes, if you are using team_type, then you have iso_fortran_env.

shahmoradi commented 4 years ago

@evansste12432 : @rouson is right, my apologies for the typo, I use iso_fortran_env and iso_c_binding intrinsic modules often, and I got confused for a moment. These two are intrinsic and should be available in at least Intel, Cray, and GFortran compilers by default, as far as I am aware (perhaps also including PGI, IBM, NAG, ... compilers). The statement flush (output_unit) has worked in applications that I have worked with so far. I'd be surprised if it is otherwise.

evansste12432 commented 4 years ago

@shahmoradi , I don't know why it isn't working for me. If the following program produces correct results for you, then there would have to be something wrong with my system:

program sync_issue
    use iso_fortran_env, only : output_unit
    implicit none

    integer :: n, noe

    noe = 3
    do n = 1,noe
        print *, 'red ', n, ' ', this_image()
        flush(output_unit)
        sync all

        print *, 'green ', n, ' ', this_image()
        flush(output_unit)
        sync all
    end do
end program sync_issue
shahmoradi commented 4 years ago

Hi @evansste12432 , I tested your code and confirm the behavior. This was a very interesting post and counter-intuitive for me, so I decided to bring this up on Intel forum to get a response from the community as well, including the Fortran committee chair Steve Lionel,

https://software.intel.com/en-us/forums/intel-fortran-compiler/topic/837101#comment-1948312

I'd suggest you get involved in the discussion and report any observations you have had there too. In sum, according to Steve, the observed behavior with flush() is correct, although I am not quite convinced yet, still waiting for his further clarifications in the discussion on the Intel forum.

FLUSHing the standard output to impose order in Coarray applications
This question was brought up by someone on the OpenCoarray's forum, which I thought it deserves an in-depth discussion with the Intel community as well. The Fortran standard as translated by Metcalf et al in their book "Modern Fortran explained" states that:
evansste12432 commented 4 years ago

Hello @shahmoradi.

While previously searching for solutions to this problem, I came across the following link:

https://software.intel.com/en-us/forums/intel-fortran-compiler/topic/531146

You'll notice that this issue has, in deed, been discussed in the same Intel forum. At that time, flush 6 was suggested, and seemed to work for one person who was involved in the discussion -- Though, as previously stated, I could never get it to work.

During the conversation, more is explained from the developer, which may shed more light on the issue. It was the last post in the conversation, which makes me think the explanation may have put all concern to rest.

Wrong behavior of sync all for coarray application
Here is a short coarray code which gives a wrong output. ! This program gives a wrong output as we would expect the line witht eh '*' character to be the last line to be printed. ! Compilation: ifort -coarray main.f90; ./a.out Program Main implicit none write(*,"('[Main]: Before sync all: This_Image() = ',i3,' / ',g0)") This_Image(), Num_Images() sync all if ( This_Image() == 1 ) write(*,"('[Main]: **************************************** < SYNCHRONIZATION')") End Program This code gives the following output: