trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.2k stars 563 forks source link

Teuchos: TeuchosParser_Parser_UnitTests_MPI_1 succeeds on amd64, but fails on ppc64el #1698

Closed nschloe closed 7 years ago

nschloe commented 7 years ago

Until recently, all tests that succeeded on amd64 also succeeded on ppc64el, but now TeuchosParser_Parser_UnitTests_MPI_1 fails on ppc64el. Full details here.

 75/980 Test  #75: TeuchosParser_Parser_UnitTests_MPI_1 ................................................................***Failed    2.09 sec
--------------------------------------------------------------------------
[[45987,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: bos01-ppc64el-023

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
Teuchos::GlobalMPISession::GlobalMPISession(): started processor with name bos01-ppc64el-023 and rank 0!

***
*** Unit test suite ...
***

Sorting tests by group name then by the order they were added ... (time = 3.78e-06)

Running unit tests ...

0. Parser_finite_automaton_UnitTest ... 

 p=0: *** Caught standard std::exception of type 'std::logic_error' :

  /<<PKGBUILDDIR>>/packages/teuchos/parser/src/Teuchos_FiniteAutomaton.cpp:87:

  Throw number = 1

  Throw test that evaluated to true: !(symbol < get_ncols(fa.table))

  Error!
 [FAILED]  (0.000373 sec) Parser_finite_automaton_UnitTest
ibaned commented 7 years ago

Hmm... I'm responsible for this code, but I might not be able to figure this out without access to a machine of this architecture... @nschloe is there a cloud service that could provide one?

@trilinos/framework do we have ppc64el machines?

ibaned commented 7 years ago

so I wasn't familiar with the architecture codes, but I guess Power 8 is in this category? If so I should have a machine I can test on.

ibaned commented 7 years ago

I've replicated the issue, will now debug it.

ibaned commented 7 years ago

subtle issue related to char = unsigned char on this architecture, versus signed char on most others. Fix is in PR #1700, which I'll kick off the merge process for right away.

ibaned commented 7 years ago

Fix is in develop branch now.

ibaned commented 7 years ago

I'll close this assuming that it fixes the problem (confirmed on our POWER8 machine), but please reopen it if it doesn't actually fix it.

nschloe commented 7 years ago

Thanks!