pmodels / casper

Process-based Asynchronous Progress Model for MPI Communication
https://pmodels.github.io/casper-www/
Other
9 stars 4 forks source link

Unexpected error class=17 (MPI_ERR_INTERN: internal error), expected=6 #22

Closed jeffhammond closed 5 years ago

jeffhammond commented 7 years ago

This test fails with Open-MPI on Linux. Is the test overzealous or is Open-MPI RMA buggy w.r.t. error codes?

testing mpiexec=mpiexec -np 4 CSP_NG=0 win_errhan ...
CASPER Configuration:  
    RMA_ERR_CHECK    (enabled) 
    CSP_VERBOSE      = err||conf_g||
    CSP_NG           = 0
    CSP_ASYNC_CONFIG = on

Unexpected error class=17 (MPI_ERR_INTERN: internal error), expected=6
In win_errhan_fnc: error class does not match
Unexpected error class=17 (MPI_ERR_INTERN: internal error), expected=6
Return error class does not match

https://travis-ci.org/pmodels/casper/jobs/220763732 https://s3.amazonaws.com/archive.travis-ci.org/jobs/220763732/log.txt

minsii commented 7 years ago

It seems an OpenMPI issue. Because CSP_NG is 0, which means casper does nothing. I will confirm.

minsii commented 7 years ago

Confirmed that this is because OMPI does not return MPI_ERR_RANK error code if user uses invalid rank (e.g., win_lock(rank = -2)). Standard does not force implementation to return that code though. I think we just allow similar failure in tests win_errhan and comm_errhan with CSP_NG=0.

jeffhammond commented 7 years ago

I reported the bug and submitted a PR to fix it.

jeffhammond commented 5 years ago

Open-MPI merged my bugfix last year...