Open rouson opened 5 years ago
I'm seeing the following failures with OpenMPI and OC 2.5.0:
96% tests passed, 3 tests failed out of 78
Total Test time (real) = 44.76 sec
The following tests FAILED:
14 - alloc_comp_get_convert_nums (Failed)
23 - alloc_comp_send_convert_nums (Failed)
69 - issue-515-mimic-mpi-gatherv (Failed)
Errors while running CTest
Hopefully it's the same problem we're seeing here... we'll see what happens with the "bottling" of the latest 2.5.0 release of OpenCoarrays.
@rouson I can't reproduce this with a fresh install of OpenCoarrays from Homebrew. I'm going to close this. If you have issues that are persisting, you can re-open or we can investigate together.
Seems that the original install of openmpi was the problem and since coarrays install couldn't see openmpi it loaded the default. mpich is loaded with coarrays and it is a pass-through openmpi if it exist on the system. Openmpi in turn wraps gfortran with mpifort. Any link in the chain can break the process and all of these have settings and flags to fine tune the program to the task at hand. Since mpich is the default most are unaware that mpich will run successfully in series to itself. I am new to openmpi and the last time I did fortran programming was on a monitor that can do a batch job to a mainframe and you would get these cards and sort and verify the order of each line of code and then feed them to the beast.
my problem is a simple one but searching through documentation for the needle in the haystacks is getting the best of me. I have to ethernet ports on this computer so I think that if I link them together I can at least tell that osc ucx is working
@cprich01 it's not clear to me what the actual issue is that you are facing. I suggest opening a new bug report unless you are facing exactly the same problem on macOS as described above.
Defect/Bug Report
uname -a
:Darwin localhost 18.2.0 Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64 x86_64
Observed Behavior
The error occurs intermittently (non-deterministically).
Installing using the OpenCoarrays installer eliminates the problem -- presumably because the installer installs MPICH instead of OpenMPI.