Open bartlettroscoe opened 4 months ago
@achauphan and @sebrowne, did the frameworks monitoring of the randomly failing tests not pick up this random test failure?
I only decided to run this query after one of the PRs I was reviewing showed this failure. But this test had failed/segfaulted randomly five other times before since the end of last month (and no one bothered to post an issue for this?).
Thanks for the notification. That's troubling... I haven't seen that test fail in recent memory, and haven't known it to exhibit non-deterministic or random behavior. I'll look into it and try to resolve what's happening.
@bartlettroscoe I looked back through the history of the tool’s messages and it has not flagged that test at all. Remember, all we’re currently flagging are tests that failed, then passed on the same SHA1.
EDIT: It flagged it from last week, but not prior to that.
@bartlettroscoe I looked back through the history of the tool’s messages and it has not flagged that test at all. Remember, all we’re currently flagging are tests that failed, then passed on the same SHA1.
Not surprising. The current screening approach will miss a lot of actual random failures.
EDIT: It flagged it from last week, but not prior to that.
The next step is to run a query looking for that same test failure with similar output where that test is the only test failing in that build. That was the case with these particular test failure. You could write an automated tool to do this.
I've identified some undefined behavior associated with using something similar to &vec[0]
on an empty vector, which can dereference a null pointer. Disappointingly, that often doesn't cause a seg-fault, but it can.
In any case I will try to get a stk update into trilinos as soon as I can.
This should be addressed by #13288. That pull-request turned this test off. A coming-soon stk update will fix the actual undefined-behavior which is causing that test to be flaky.
CC: @alanw0, @sebrowne, @achauphan
Next Action Status