Closed lucbv closed 4 years ago
Another option is to only test deterministic algorithms. That's not great, since it's not what we will use in practice. However, it still would do what the test advertises, i.e. test the parameter list interpreter.
For a given GPU architecture, should we expect deterministic behavior, e.g., from aggregation?
@jhux2 to be honest I'm not sure what the exact state of affair is right now since @brian-kelley is working on coloring algo in KK. If we use a parallel coloring algorithm, it will be unlikely that we can get twice the same coloring and hence the same aggregates... Also we do not have a fully deterministic aggregation stack, some phases are not yet implemented deterministically, finally I do not think that we actually tested the behavior of the deterministic algorithms all that much.
@lucbv I was only improving the non-deterministic parallel, and sequential (host code) dist-2 colorings. The sequential version (including the device -> host -> device deep copies) is now faster than any of the old versions by far, so maybe we could just use that for these tests. Still have to get that checked in though. I was waiting to solve a bug in one of the phases because large-ish distributed problems (8 ranks, 250^3 brick3D) crash randomly in aggregation. Until that, I can't prove it's not a bug in my new coloring.
Since there seems to be demand, I think it would be doable to implement reasonably fast parallel deterministic dist-2 in terms of a triangular structure-only SPGEMM (the one for triangle counting) and then a deterministic dist-1. I also still have your PDF from 2018 with the dependency list algorithm, and tiebreaks using degree and LID.
Btw, my 2c is to not disable this test on GPU.
Can this be closed?
Question
@trilinos/muelu @jhux2 @csiefer2 @cgcgcg
Christian did some work a month or two ago on the ParameterListInterpreter tests to have output from all ranks and to clean-up a bit some logic. However we kicked the can regarding that test's behavior on GPU. The main issue is that the output generated on GPU is not the same as the output generated on CPU and that is natural since the algorithms are quite different on that hardware.
Here come the question: what do we want to do on GPU
Anyone would like to offer their wisdom on that issue?