sourceryinstitute / OpenCoarrays

A parallel application binary interface for Fortran 2018 compilers.
http://www.opencoarrays.org
BSD 3-Clause "New" or "Revised" License
245 stars 58 forks source link

Defect: Test 11, 12, 13, 14, 29 and 76 fail during install on GNU/Linux #703

Closed alexcarr1721 closed 4 years ago

alexcarr1721 commented 4 years ago

System information including:

To help us debug your issue please explain:

What you were trying to do (and why)

I am trying to install OpenCoarrays under my home directory on shared supercomputing resources (Link to wiki: https://help.rc.ufl.edu/doc/UFRC_Help_and_Documentation) in order to compile and run coarray fortran programs. I was able to install and compile/run a simple hello world program, but I receive the following warnings:

$ caf main.f90
$ cafrun -n 4 ./a.out
 Hello from Image            1
 Hello from Image            2
 Hello from Image            3
 Hello from Image            4
[1584305420.225155] [i21a-s1:17820:0]          mpool.c:38   UCX  WARN  object 0x1589ac0 was not returned to mpool ucp_am_bufs
[1584305420.225172] [i21a-s1:17820:0]          mpool.c:38   UCX  WARN  object 0x158bb40 was not returned to mpool ucp_am_bufs
[1584305420.225192] [i21a-s1:17820:0]          mpool.c:38   UCX  WARN  object 0x2af00b3b8860 was not returned to mpool mm_recv_desc
[1584305420.224957] [i21a-s1:17822:0]          mpool.c:38   UCX  WARN  object 0x228c8c0 was not returned to mpool ucp_am_bufs
[1584305420.224991] [i21a-s1:17822:0]          mpool.c:38   UCX  WARN  object 0x22909c0 was not returned to mpool ucp_am_bufs
[1584305420.225000] [i21a-s1:17822:0]          mpool.c:38   UCX  WARN  object 0x2292a40 was not returned to mpool ucp_am_bufs
[1584305420.225428] [i21a-s1:17823:0]          mpool.c:38   UCX  WARN  object 0x1d5b900 was not returned to mpool ucp_am_bufs
[1584305420.225445] [i21a-s1:17823:0]          mpool.c:38   UCX  WARN  object 0x1d5d980 was not returned to mpool ucp_am_bufs
[1584305420.225449] [i21a-s1:17823:0]          mpool.c:38   UCX  WARN  object 0x1d5fa00 was not returned to mpool ucp_am_bufs
[1584305420.225854] [i21a-s1:17821:0]          mpool.c:38   UCX  WARN  object 0xba3ac0 was not returned to mpool ucp_am_bufs
[1584305420.225871] [i21a-s1:17821:0]          mpool.c:38   UCX  WARN  object 0xba5b40 was not returned to mpool ucp_am_bufs
[1584305420.225875] [i21a-s1:17821:0]          mpool.c:38   UCX  WARN  object 0xba7bc0 was not returned to mpool ucp_am_bufs

Here is the code:

program main
   implicit none

   if ( this_image() .eq. 1 ) then
      print *, "Hello from Image ", this_image()
   else if ( num_images() .ne. 1 ) then
      sync images ( this_image() - 1 )
      print *, "Hello from Image ", this_image()
   end if
   if ( this_image() .lt. num_images() ) sync images ( this_image() + 1)

end program main

After inspecting the output of $ ctest --verbose --extra-verbose I find these warnings throughout the tests, and several tests failed.

What happened (include command output, screenshots, logs, etc.)

During make test:

The following tests FAILED:
     11 - async_comp_alloc_2 (Failed)
     12 - comp_allocated_1 (Failed)
     13 - comp_allocated_2 (Failed)
     14 - alloc_comp_get_convert_nums (Failed)
     29 - alloc_comp_send_convert_nums (Failed)
     76 - issue-515-mimic-mpi-gatherv (Failed)

After running,

$ export CTEST_OUTPUT_ON_FAILURE=1
$ ctest --verbose --extra-verbose

The log file contained many warnings such as:

1: [1584155901.367986] [login4:29566:0]          mpool.c:38   UCX  WARN  object 0x24a3180 was not returned to mpool ucp_am_bufs

Although many tests successfully passed with this warning, it might be an indication of a bigger issue. The entire log file of the ctest command is: ctest_out.txt

What you expected to happen

All tests passed

Step-by-step reproduction instructions to reproduce the error/bug

$ wget https://github.com/sourceryinstitute/OpenCoarrays/releases/download/2.8.0/OpenCoarrays-2.8.0.tar.gz
$ tar -xvf OpenCoarrays-2.8.0.tar.gz
$ mkdir opencoarrays-build
$ cd opencoarrays-build
$ export FC=/path/to/gfortran
$ export CC=/path/to/gcc
$ cmake /path/to/OpenCoarrays/source \
  -DCMAKE_INSTALL_PREFIX=/path/to/desired/installation/location
$ make
$ make test 
$ make install

Questions

I am not quite sure why the tests fail and if it is even an issue. For instance will it prevent more complicated fortran code than just a hello world program from compiling and running? Also is there a way to install OpenCoarrays such that the UCX warnings are eliminated?

rouson commented 4 years ago

@alexcarr1721 thanks for the report. I will attempt to reproduce it. Please also let me know whether installing via a package manager is an option for you and whether upgrading to GCC 8.3.0 is an option. I've used OpenCoarrays extensively with GCC 8.3.0 and have found that all tests pass.

afanfa commented 4 years ago

Hello Alex, what happens if you try to run with srun instead of cafrun?

On Sun, Mar 15, 2020 at 3:17 PM Alex Carr notifications@github.com wrote:

  • I am reporting a bug others will be able to reproduce and not asking a question or requesting a new feature.

System information including:

  • OpenCoarrays Version: 2.8.0

  • Fortran Compiler: GNU Fortran (GCC) 8.2.0

  • C compiler used for building lib: gcc (GCC) 8.2.0

  • Installation method: cmake, make, make install

  • All flags & options passed to the installer:

  • $ cmake ../OpenCoarrays-2.8.0 DCMAKE_INSTALL_PREFIX=$HOME/local/opencoarrays/2.8.0

  • Output of uname -a: Linux i21a-s1.ufhpc 3.10.0-957.35.2.el7.x86_64 #1 https://github.com/sourceryinstitute/OpenCoarrays/issues/1 SMP Wed Sep 18 05:51:28 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux

  • MPI library being used: openmpi-4.0.1

  • Machine architecture and number of physical cores: Varies, using shared resources: https://help.rc.ufl.edu/doc/Available_Node_Features

  • Version of CMake: 3.15.6

To help us debug your issue please explain: What you were trying to do (and why)

I am trying to install OpenCoarrays under my home directory on shared supercomputing resources (Link to wiki: https://help.rc.ufl.edu/doc/UFRC_Help_and_Documentation) in order to compile and run coarray fortran programs. I was able to install and compile/run a simple hello world program, but I receive the following warnings:

$ caf main.f90 $ cafrun -n 4 ./a.out Hello from Image 1 Hello from Image 2 Hello from Image 3 Hello from Image 4 [1584305420.225155] [i21a-s1:17820:0] mpool.c:38 UCX WARN object 0x1589ac0 was not returned to mpool ucp_am_bufs [1584305420.225172] [i21a-s1:17820:0] mpool.c:38 UCX WARN object 0x158bb40 was not returned to mpool ucp_am_bufs [1584305420.225192] [i21a-s1:17820:0] mpool.c:38 UCX WARN object 0x2af00b3b8860 was not returned to mpool mm_recv_desc [1584305420.224957] [i21a-s1:17822:0] mpool.c:38 UCX WARN object 0x228c8c0 was not returned to mpool ucp_am_bufs [1584305420.224991] [i21a-s1:17822:0] mpool.c:38 UCX WARN object 0x22909c0 was not returned to mpool ucp_am_bufs [1584305420.225000] [i21a-s1:17822:0] mpool.c:38 UCX WARN object 0x2292a40 was not returned to mpool ucp_am_bufs [1584305420.225428] [i21a-s1:17823:0] mpool.c:38 UCX WARN object 0x1d5b900 was not returned to mpool ucp_am_bufs [1584305420.225445] [i21a-s1:17823:0] mpool.c:38 UCX WARN object 0x1d5d980 was not returned to mpool ucp_am_bufs [1584305420.225449] [i21a-s1:17823:0] mpool.c:38 UCX WARN object 0x1d5fa00 was not returned to mpool ucp_am_bufs [1584305420.225854] [i21a-s1:17821:0] mpool.c:38 UCX WARN object 0xba3ac0 was not returned to mpool ucp_am_bufs [1584305420.225871] [i21a-s1:17821:0] mpool.c:38 UCX WARN object 0xba5b40 was not returned to mpool ucp_am_bufs [1584305420.225875] [i21a-s1:17821:0] mpool.c:38 UCX WARN object 0xba7bc0 was not returned to mpool ucp_am_bufs

Here is the code:

program main implicit none

if ( this_image() .eq. 1 ) then print , "Hello from Image ", this_image() else if ( num_images() .ne. 1 ) then sync images ( this_image() - 1 ) print , "Hello from Image ", this_image() end if if ( this_image() .lt. num_images() ) sync images ( this_image() + 1)

end program main

After inspecting the output of $ ctest --verbose --extra-verbose I find these warnings throughout the tests, and several tests failed. What happened (include command output, screenshots, logs, etc.)

During make test:

The following tests FAILED: 11 - async_comp_alloc_2 (Failed) 12 - comp_allocated_1 (Failed) 13 - comp_allocated_2 (Failed) 14 - alloc_comp_get_convert_nums (Failed) 29 - alloc_comp_send_convert_nums (Failed) 76 - issue-515-mimic-mpi-gatherv (Failed)

After running,

$ export CTEST_OUTPUT_ON_FAILURE=1 $ ctest --verbose --extra-verbose

The log file contained many warnings such as:

1: [1584155901.367986] [login4:29566:0] mpool.c:38 UCX WARN object 0x24a3180 was not returned to mpool ucp_am_bufs

Although many tests successfully passed with this warning, it might be an indication of a bigger issue. The entire log file of the ctest command is: ctest_out.txt https://github.com/sourceryinstitute/OpenCoarrays/files/4335033/ctest_out.txt What you expected to happen

All tests passed Step-by-step reproduction instructions to reproduce the error/bug

$ wget https://github.com/sourceryinstitute/OpenCoarrays/releases/download/2.8.0/OpenCoarrays-2.8.0.tar.gz $ tar -xvf OpenCoarrays-2.8.0.tar.gz $ mkdir opencoarrays-build $ cd opencoarrays-build $ export FC=/path/to/gfortran $ export CC=/path/to/gcc $ cmake /path/to/OpenCoarrays/source \ -DCMAKE_INSTALL_PREFIX=/path/to/desired/installation/location $ make $ make test $ make install

Questions

I am not quite sure why the tests fail and if it is even an issue. For instance will it prevent more complicated fortran code than just a hello world program from compiling and running? Also is there a way to install OpenCoarrays such that the UCX warnings are eliminated?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sourceryinstitute/OpenCoarrays/issues/703, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAM5VYZBNTO3DQS7LDCOB3RHVAVDANCNFSM4LLMJN2Q .

--

Alessandro Fanfarillo

alexcarr1721 commented 4 years ago

Hi @afanfa , The same warnings appear:

$ srun ./a.out
 Hello from Image            1
 Hello from Image            2
 Hello from Image            3
 Hello from Image            4
[1584309583.988148] [i21a-s1:26863:0]          mpool.c:38   UCX  WARN  object 0xefc140 was not returned to mpool ucp_am_bufs
[1584309583.988158] [i21a-s1:26863:0]          mpool.c:38   UCX  WARN  object 0xefe1c0 was not returned to mpool ucp_am_bufs
[1584309583.988160] [i21a-s1:26863:0]          mpool.c:38   UCX  WARN  object 0xf00240 was not returned to mpool ucp_am_bufs
[1584309584.030460] [i21a-s1:26864:0]          mpool.c:38   UCX  WARN  object 0xb8fe80 was not returned to mpool ucp_am_bufs
[1584309584.030470] [i21a-s1:26864:0]          mpool.c:38   UCX  WARN  object 0xb93f80 was not returned to mpool ucp_am_bufs
[1584309584.030473] [i21a-s1:26864:0]          mpool.c:38   UCX  WARN  object 0xb96000 was not returned to mpool ucp_am_bufs
[1584309584.072493] [i21a-s1:26865:0]          mpool.c:38   UCX  WARN  object 0x1930080 was not returned to mpool ucp_am_bufs
[1584309584.072502] [i21a-s1:26865:0]          mpool.c:38   UCX  WARN  object 0x1934180 was not returned to mpool ucp_am_bufs
[1584309584.072505] [i21a-s1:26865:0]          mpool.c:38   UCX  WARN  object 0x1936200 was not returned to mpool ucp_am_bufs
[1584309584.081279] [i21a-s1:26862:0]          mpool.c:38   UCX  WARN  object 0x1c6d740 was not returned to mpool ucp_am_bufs
[1584309584.081289] [i21a-s1:26862:0]          mpool.c:38   UCX  WARN  object 0x1c6f7c0 was not returned to mpool ucp_am_bufs
[1584309584.081307] [i21a-s1:26862:0]          mpool.c:38   UCX  WARN  object 0x2ad4a3fc9760 was not returned to mpool mm_recv_desc

I also ran with the flags --mpi=pmix_v3, and --ntasks=4 --cpus-per-task=1 with the same result.

@rouson 1) As far as I am aware, the only package manager on the system is yum and I do not have access to it. I am currently installing homebrew in my home directory and will attempt to install with this package manager. 2) I will try updating to GCC 8.3.0 and see if that works.

Thanks for the quick responses!

rouson commented 4 years ago

@alexcarr1721 I just used the OpenCoarrays installer to build GCC 8.2.0, MPICH 3.2, and the current HEAD of the OpenCoarrays master branch (which I believe is only one commit ahead of OpenCoarrays 2.8.0) inside a Lubuntu Linux 19.10 virtual machine. All tests pass for me so I'm going to close this issue, but feel free to reopen it if steps similar to the ones below don't work for you or are unsuitable for some other reason. The commands I used are the following:

git clone git@github.com:sourceryinstitute/opencoarrays
cd opencoarrays/
./install.sh -p gcc -I 8.2.0 -j 8 -i $HOME/Desktop/software/gnu/8.2.0
export PATH=$HOME/Desktop/software/gnu/8.2.0/bin:$PATH
export LD_LIBRARY_PATH=$HOME/Desktop/software/gnu/8.2.0/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$HOME/Desktop/software/gnu/8.2.0/lib64:$LD_LIBRARY_PATH
./install.sh -p mpich -I 3.2 -j 8 -i $HOME/Desktop/software/mpich/3.2/gnu/8.2.0
export PATH=$HOME/Desktop/software/mpich/3.2/gnu/8.2.0/bin:$PATH
./install.sh -j 8 -i $HOME/Desktop/software/opencoarrays/2.8.0-9643/gnu/8.2.0 -y -f $(which gfortran) -c $(which gcc) -C $(which g++)
source $HOME/Desktop/software/opencoarrays/2.8.0-9643/gnu/8.2.0/setup.sh
cd prerequisites/builds/opencoarrays/2.8.0/
ctest

The biggest differences with the above approach are the following:

  1. It tends to be a little easier use OpenCoarrays with MPICH than with OpenMPI.
  2. Using the OpenCoarrays installer (install.sh) means a few simple checks happen such as insuring that the compiler invoked by mpifort matches the compiler being used to build OpenCoarrays.
  3. Sourcing the setup.sh script that install.sh ensures that the relevant leading parts of the user's PATH and LD_LIBRAY_PATH match the tool chain that the installer used during the installation.
rouson commented 4 years ago

@alexcarr1721 I don't know if you get notified when a comment that tags you is edited so I'm tagging again after editing my last comment.

alexcarr1721 commented 4 years ago

@rouson I can confirm that your solution has worked for me. After installation all of the tests pass except for #82:

99% tests passed, 1 tests failed out of 86

Total Test time (real) =  30.04 sec

The following tests FAILED:
         82 - shellcheck:test-script.cmake.sh (Failed)

My hello world code now runs as expected:

$ caf main.f90
$ cafrun -n 4 ./a.out
 Hello from Image            1
 Hello from Image            2
 Hello from Image            3
 Hello from Image            4

Thanks for the help.

rouson commented 4 years ago

@alexcarr1721 great! Thanks for letting me know. Most likely, the same approach will also work for installing any newer versions of gfortran, MPICH, and OpenCoarrays. I generally recommend using the most up-to-date version that works for your code. The most recently released versions of gfortran are 8.4 and 9.3, both of which were release this month.