Closed alexcarr1721 closed 4 years ago
@alexcarr1721 thanks for the report. I will attempt to reproduce it. Please also let me know whether installing via a package manager is an option for you and whether upgrading to GCC 8.3.0 is an option. I've used OpenCoarrays extensively with GCC 8.3.0 and have found that all tests pass.
Hello Alex, what happens if you try to run with srun instead of cafrun?
On Sun, Mar 15, 2020 at 3:17 PM Alex Carr notifications@github.com wrote:
- I am reporting a bug others will be able to reproduce and not asking a question or requesting a new feature.
System information including:
OpenCoarrays Version: 2.8.0
Fortran Compiler: GNU Fortran (GCC) 8.2.0
C compiler used for building lib: gcc (GCC) 8.2.0
Installation method: cmake, make, make install
All flags & options passed to the installer:
$ cmake ../OpenCoarrays-2.8.0 DCMAKE_INSTALL_PREFIX=$HOME/local/opencoarrays/2.8.0
Output of uname -a: Linux i21a-s1.ufhpc 3.10.0-957.35.2.el7.x86_64 #1 https://github.com/sourceryinstitute/OpenCoarrays/issues/1 SMP Wed Sep 18 05:51:28 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux
MPI library being used: openmpi-4.0.1
Machine architecture and number of physical cores: Varies, using shared resources: https://help.rc.ufl.edu/doc/Available_Node_Features
Version of CMake: 3.15.6
To help us debug your issue please explain: What you were trying to do (and why)
I am trying to install OpenCoarrays under my home directory on shared supercomputing resources (Link to wiki: https://help.rc.ufl.edu/doc/UFRC_Help_and_Documentation) in order to compile and run coarray fortran programs. I was able to install and compile/run a simple hello world program, but I receive the following warnings:
$ caf main.f90 $ cafrun -n 4 ./a.out Hello from Image 1 Hello from Image 2 Hello from Image 3 Hello from Image 4 [1584305420.225155] [i21a-s1:17820:0] mpool.c:38 UCX WARN object 0x1589ac0 was not returned to mpool ucp_am_bufs [1584305420.225172] [i21a-s1:17820:0] mpool.c:38 UCX WARN object 0x158bb40 was not returned to mpool ucp_am_bufs [1584305420.225192] [i21a-s1:17820:0] mpool.c:38 UCX WARN object 0x2af00b3b8860 was not returned to mpool mm_recv_desc [1584305420.224957] [i21a-s1:17822:0] mpool.c:38 UCX WARN object 0x228c8c0 was not returned to mpool ucp_am_bufs [1584305420.224991] [i21a-s1:17822:0] mpool.c:38 UCX WARN object 0x22909c0 was not returned to mpool ucp_am_bufs [1584305420.225000] [i21a-s1:17822:0] mpool.c:38 UCX WARN object 0x2292a40 was not returned to mpool ucp_am_bufs [1584305420.225428] [i21a-s1:17823:0] mpool.c:38 UCX WARN object 0x1d5b900 was not returned to mpool ucp_am_bufs [1584305420.225445] [i21a-s1:17823:0] mpool.c:38 UCX WARN object 0x1d5d980 was not returned to mpool ucp_am_bufs [1584305420.225449] [i21a-s1:17823:0] mpool.c:38 UCX WARN object 0x1d5fa00 was not returned to mpool ucp_am_bufs [1584305420.225854] [i21a-s1:17821:0] mpool.c:38 UCX WARN object 0xba3ac0 was not returned to mpool ucp_am_bufs [1584305420.225871] [i21a-s1:17821:0] mpool.c:38 UCX WARN object 0xba5b40 was not returned to mpool ucp_am_bufs [1584305420.225875] [i21a-s1:17821:0] mpool.c:38 UCX WARN object 0xba7bc0 was not returned to mpool ucp_am_bufs
Here is the code:
program main implicit none
if ( this_image() .eq. 1 ) then print , "Hello from Image ", this_image() else if ( num_images() .ne. 1 ) then sync images ( this_image() - 1 ) print , "Hello from Image ", this_image() end if if ( this_image() .lt. num_images() ) sync images ( this_image() + 1)
end program main
After inspecting the output of $ ctest --verbose --extra-verbose I find these warnings throughout the tests, and several tests failed. What happened (include command output, screenshots, logs, etc.)
During make test:
The following tests FAILED: 11 - async_comp_alloc_2 (Failed) 12 - comp_allocated_1 (Failed) 13 - comp_allocated_2 (Failed) 14 - alloc_comp_get_convert_nums (Failed) 29 - alloc_comp_send_convert_nums (Failed) 76 - issue-515-mimic-mpi-gatherv (Failed)
After running,
$ export CTEST_OUTPUT_ON_FAILURE=1 $ ctest --verbose --extra-verbose
The log file contained many warnings such as:
1: [1584155901.367986] [login4:29566:0] mpool.c:38 UCX WARN object 0x24a3180 was not returned to mpool ucp_am_bufs
Although many tests successfully passed with this warning, it might be an indication of a bigger issue. The entire log file of the ctest command is: ctest_out.txt https://github.com/sourceryinstitute/OpenCoarrays/files/4335033/ctest_out.txt What you expected to happen
All tests passed Step-by-step reproduction instructions to reproduce the error/bug
$ wget https://github.com/sourceryinstitute/OpenCoarrays/releases/download/2.8.0/OpenCoarrays-2.8.0.tar.gz $ tar -xvf OpenCoarrays-2.8.0.tar.gz $ mkdir opencoarrays-build $ cd opencoarrays-build $ export FC=/path/to/gfortran $ export CC=/path/to/gcc $ cmake /path/to/OpenCoarrays/source \ -DCMAKE_INSTALL_PREFIX=/path/to/desired/installation/location $ make $ make test $ make install
Questions
I am not quite sure why the tests fail and if it is even an issue. For instance will it prevent more complicated fortran code than just a hello world program from compiling and running? Also is there a way to install OpenCoarrays such that the UCX warnings are eliminated?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sourceryinstitute/OpenCoarrays/issues/703, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAM5VYZBNTO3DQS7LDCOB3RHVAVDANCNFSM4LLMJN2Q .
--
Alessandro Fanfarillo
Hi @afanfa , The same warnings appear:
$ srun ./a.out
Hello from Image 1
Hello from Image 2
Hello from Image 3
Hello from Image 4
[1584309583.988148] [i21a-s1:26863:0] mpool.c:38 UCX WARN object 0xefc140 was not returned to mpool ucp_am_bufs
[1584309583.988158] [i21a-s1:26863:0] mpool.c:38 UCX WARN object 0xefe1c0 was not returned to mpool ucp_am_bufs
[1584309583.988160] [i21a-s1:26863:0] mpool.c:38 UCX WARN object 0xf00240 was not returned to mpool ucp_am_bufs
[1584309584.030460] [i21a-s1:26864:0] mpool.c:38 UCX WARN object 0xb8fe80 was not returned to mpool ucp_am_bufs
[1584309584.030470] [i21a-s1:26864:0] mpool.c:38 UCX WARN object 0xb93f80 was not returned to mpool ucp_am_bufs
[1584309584.030473] [i21a-s1:26864:0] mpool.c:38 UCX WARN object 0xb96000 was not returned to mpool ucp_am_bufs
[1584309584.072493] [i21a-s1:26865:0] mpool.c:38 UCX WARN object 0x1930080 was not returned to mpool ucp_am_bufs
[1584309584.072502] [i21a-s1:26865:0] mpool.c:38 UCX WARN object 0x1934180 was not returned to mpool ucp_am_bufs
[1584309584.072505] [i21a-s1:26865:0] mpool.c:38 UCX WARN object 0x1936200 was not returned to mpool ucp_am_bufs
[1584309584.081279] [i21a-s1:26862:0] mpool.c:38 UCX WARN object 0x1c6d740 was not returned to mpool ucp_am_bufs
[1584309584.081289] [i21a-s1:26862:0] mpool.c:38 UCX WARN object 0x1c6f7c0 was not returned to mpool ucp_am_bufs
[1584309584.081307] [i21a-s1:26862:0] mpool.c:38 UCX WARN object 0x2ad4a3fc9760 was not returned to mpool mm_recv_desc
I also ran with the flags --mpi=pmix_v3, and --ntasks=4 --cpus-per-task=1 with the same result.
@rouson 1) As far as I am aware, the only package manager on the system is yum and I do not have access to it. I am currently installing homebrew in my home directory and will attempt to install with this package manager. 2) I will try updating to GCC 8.3.0 and see if that works.
Thanks for the quick responses!
@alexcarr1721 I just used the OpenCoarrays installer to build GCC 8.2.0, MPICH 3.2, and the current HEAD
of the OpenCoarrays master
branch (which I believe is only one commit ahead of OpenCoarrays 2.8.0) inside a Lubuntu Linux 19.10 virtual machine. All tests pass for me so I'm going to close this issue, but feel free to reopen it if steps similar to the ones below don't work for you or are unsuitable for some other reason. The commands I used are the following:
git clone git@github.com:sourceryinstitute/opencoarrays
cd opencoarrays/
./install.sh -p gcc -I 8.2.0 -j 8 -i $HOME/Desktop/software/gnu/8.2.0
export PATH=$HOME/Desktop/software/gnu/8.2.0/bin:$PATH
export LD_LIBRARY_PATH=$HOME/Desktop/software/gnu/8.2.0/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$HOME/Desktop/software/gnu/8.2.0/lib64:$LD_LIBRARY_PATH
./install.sh -p mpich -I 3.2 -j 8 -i $HOME/Desktop/software/mpich/3.2/gnu/8.2.0
export PATH=$HOME/Desktop/software/mpich/3.2/gnu/8.2.0/bin:$PATH
./install.sh -j 8 -i $HOME/Desktop/software/opencoarrays/2.8.0-9643/gnu/8.2.0 -y -f $(which gfortran) -c $(which gcc) -C $(which g++)
source $HOME/Desktop/software/opencoarrays/2.8.0-9643/gnu/8.2.0/setup.sh
cd prerequisites/builds/opencoarrays/2.8.0/
ctest
The biggest differences with the above approach are the following:
install.sh
) means a few simple checks happen such as insuring that the compiler invoked by mpifort
matches the compiler being used to build OpenCoarrays.setup.sh
script that install.sh
ensures that the relevant leading parts of the user's PATH
and LD_LIBRAY_PATH
match the tool chain that the installer used during the installation.@alexcarr1721 I don't know if you get notified when a comment that tags you is edited so I'm tagging again after editing my last comment.
@rouson I can confirm that your solution has worked for me. After installation all of the tests pass except for #82:
99% tests passed, 1 tests failed out of 86
Total Test time (real) = 30.04 sec
The following tests FAILED:
82 - shellcheck:test-script.cmake.sh (Failed)
My hello world code now runs as expected:
$ caf main.f90
$ cafrun -n 4 ./a.out
Hello from Image 1
Hello from Image 2
Hello from Image 3
Hello from Image 4
Thanks for the help.
@alexcarr1721 great! Thanks for letting me know. Most likely, the same approach will also work for installing any newer versions of gfortran
, MPICH, and OpenCoarrays. I generally recommend using the most up-to-date version that works for your code. The most recently released versions of gfortran
are 8.4 and 9.3, both of which were release this month.
System information including:
uname -a
: Linux i21a-s1.ufhpc 3.10.0-957.35.2.el7.x86_64 #1 SMP Wed Sep 18 05:51:28 EDT 2019 x86_64 x86_64 x86_64 GNU/LinuxTo help us debug your issue please explain:
What you were trying to do (and why)
I am trying to install OpenCoarrays under my home directory on shared supercomputing resources (Link to wiki: https://help.rc.ufl.edu/doc/UFRC_Help_and_Documentation) in order to compile and run coarray fortran programs. I was able to install and compile/run a simple hello world program, but I receive the following warnings:
Here is the code:
After inspecting the output of
$ ctest --verbose --extra-verbose
I find these warnings throughout the tests, and several tests failed.What happened (include command output, screenshots, logs, etc.)
During make test:
After running,
The log file contained many warnings such as:
Although many tests successfully passed with this warning, it might be an indication of a bigger issue. The entire log file of the ctest command is: ctest_out.txt
What you expected to happen
All tests passed
Step-by-step reproduction instructions to reproduce the error/bug
Questions
I am not quite sure why the tests fail and if it is even an issue. For instance will it prevent more complicated fortran code than just a hello world program from compiling and running? Also is there a way to install OpenCoarrays such that the UCX warnings are eliminated?