ornladios / ADIOS2

Next generation of ADIOS developed in the Exascale Computing Program
https://adios2.readthedocs.io/en/latest/index.html
Apache License 2.0
267 stars 125 forks source link

Remote.GetRemote SST tests unstable in Frontier #4257

Open vicentebolea opened 1 month ago

vicentebolea commented 1 month ago

Describe the bug Remote.GetRemote SST tests unstable in Frontier

To Reproduce https://open.cdash.org/viewTest.php?onlyfailed&buildid=9773810

[flag] [Remote.BPWriteReadADIOS2stdio.GetRemote](https://open.cdash.org/test/1647777835)    [Failed](https://open.cdash.org/test/1647777835)    60ms    Completed (SEGFAULT)    Unstable    [Unstable](https://open.cdash.org/testSummary.php?project=85&name=Remote.BPWriteReadADIOS2stdio.GetRemote&date=2024-07-19)          
[flag] [Remote.BPWriteMemorySelectionRead.GetRemote](https://open.cdash.org/test/1647777838)    [Failed](https://open.cdash.org/test/1647777838)    60ms    Completed (SEGFAULT)    Unstable    [Unstable](https://open.cdash.org/testSummary.php?project=85&name=Remote.BPWriteMemorySelectionRead.GetRemote&date=2024-07-19)

Desktop (please complete the following information): Frontier

Additional context Mostly the tests passes some builds it fails

vicentebolea commented 1 month ago

@eisenhauer

eisenhauer commented 1 month ago

If this is defaulting to RDMA, we may need some stuff from @franzpoeschel for this to be stable. You might force it to use TCP rather than RDMA. (Don't let it find libfabric on the build would be one way to do that.)

franzpoeschel commented 1 month ago

That's good timing then, I've started working on cleaning my branch up yesterday to make it mergeable.