Closed mhoemmen closed 8 years ago
@jhux2 @csiefer2 @ambrad
Adding @aprokop, since this will affect reuse of smoother components across multigrid solves.
I'm hoping that fixing this would let MueLu not need a special case, but not actually require MueLu to change its code.
From @ambrad's #651 post:
@trilinos/ifpack2, @mhoemmen , @srajama1
RILUK's symbolic-phase information cannot be reused right now. Users must call initialize() before compute(), always. See, e.g., the logic surrounding
in MueLu_Ifpack2Smoother_def.hpp. This issue is focused on making RILUK have a proper symbolic/numeric/apply division of labor and so a reuse capability.
Oops, sorry about the duplicate. I'm working on finishing up a commit now that makes substantial progress on this issue. I'll bring it in as a PR so we can review it.
Oops, sorry about the duplicate.
No worries!
I'm working on finishing up a commit now that makes substantial progress on this issue. I'll bring it in as a PR so we can review it.
Awesome! Looking forward to it!
@mhoemmen @jhux2, what do you think:
https://github.com/ambrad/Trilinos/commit/bb2bf9c234664dbb05635f9c0f3b6a9c04a2d569
I'm getting nice speedup on a 32-rank problem, both with a direct use of RILUK and and also with an indirect use through AdditiveSchwarz.
I left a comment on the commit. Hopefully it will get routed to @ambrad but this is just a note if it doesn't. It looks good.
Thanks @srajama1 . What is the warning?
Btw, I'm going to check if it's worth saving A_local as a member variable. To avoid keeping memory around in the no-reuse case, I might store to Alocal only after the first reuse call. That way, single-use preconditioning's memory footprint won't change.
I was wondering why "A_localcrs = A_local_crs;" was there ? Whether it was for satisfying a warning. Yes, it might be good to keep the memory footprint for no-reuse.
@srajama1 , I see. I removed the tmp variable and just use A_localcrs directly now.
I've also updated with the Alocal cached version. Performance testing shows it's a another small speedup. As we discussed, it will cache only after the first reuse, so the no-reuse case's memory footprint is unaffected. Updated commit here:
https://github.com/ambrad/Trilinos/commit/785808bcf3f8cac615822a33db3f16fac712dafc
It occurred to me that I might be able to call initAllValues with Llocal instead of L_localcrs, obviating the copy from L_localcrs to Llocal except the first time. I think that should work immediately or with only minimal modification. I'm thinking that L_localcrs is only needed to create the ILU(k) graph. If that's true, L_localcrs should not be a member variable (but Llocal should be); its memory can be released after the graph is created. I'll try this tomorrow. Could lead to a very fast refresh of numbers.
OK, the method I described in my previous comment works. I've redone the commit to use that approach. Now A_localcrs is no longer a member variable (except if the experimental Basker code is being used); Alocal takes its place. makeLocalFilter is called just in initialize(), initializing Alocal. initAllValues(*Alocal) is always called in compute() and never in initialize(). The key here is that there is no longer a large intermediate copy of the matrix in reuse calls, but only in the original call (as previously), and only then transiently in initialize(). In reuse calls, Alocal loads L and U directly. Don't forget Alocal is just a wrapper around A_, so it's not very large, unlike A_localcrs was. The ordering consistency check is done just once in initialize() (and not in initiAllValues()). All of this gives still a little more speedup. Here's the new commit:
https://github.com/ambrad/Trilinos/commit/b8636539e5a1e7d61b20fe2b39f2f5d22da59c42
@ambrad This reminds me to link to #558; @pwxy posted some test cases that started failing when I introduced LocalSparseTriangularSolver. We need to check current status; my fix for Tpetra::Crs{Graph,Matrix}::globalAssemble may have even fixed those.
Pushed the fix and so closing.
@trilinos/ifpack2
@ambrad pointed out to me that Ifpack2::RILUK::initialize does setup that depends on the matrix's values. This is incorrect. The initialize() method is only supposed to do setup that depends on the graph.
This may relate to #558.
Easiest work-around is to move the entire contents of initialize() to the top of compute(). It's perfectly correct for initialize() to do nothing. For example, some sparse LU factorizations with pivoting may combine the symbolic and numeric factorization phases into one phase.