ulfm-devel / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
0 stars 0 forks source link

Various error messages in otherwise "normal" error scenarios #13

Closed abouteiller closed 6 years ago

abouteiller commented 7 years ago

Original report by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).


On Mon, Dec 5, 2016 at 3:39 AM, Jan Stengler jan.stengler@t-online.de wrote: Hi George,

I was able to build and run mpi programs successfully with ULFM2. However I was experiencing some strange behavior under Ubuntu (14.04 & 16.04). A process is always printing the following output:

[lu245280:28590] Wrote -1, expected 4, errno = 3

This is a message from our shared memory support, notifying that it failed while writing to CMA. I expect this is a result of the dead of a process, and another process on the same node trying to communicate with him. The more pending messages, the more you will get this output, so it is expected not to be deterministic. Harmless, but certainly annoying. We''l focus on cleaning this up before the final release.

How often that is printed is randomly. Under Mac OS I am not receiving that output. Otherwise the program runs as expected and terminates successfully but also prints (in Ubuntu & Mac):


Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.

Do you know why I receive this output and how I can suppress these messages?

This is coming from an overzealous runtime, which at the end of the execution is complaining if any of the return code is non zero. We'll fix it in the future to only keep the return code of the last set of processes.

abouteiller commented 6 years ago

Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).


Resolved in 08c6f2d6e