trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.2k stars 563 forks source link

Ifpack2::RILUK::initialize incorrectly does setup that depends on the matrix's values #570

Closed mhoemmen closed 8 years ago

mhoemmen commented 8 years ago

@trilinos/ifpack2

@ambrad pointed out to me that Ifpack2::RILUK::initialize does setup that depends on the matrix's values. This is incorrect. The initialize() method is only supposed to do setup that depends on the graph.

This may relate to #558.

Easiest work-around is to move the entire contents of initialize() to the top of compute(). It's perfectly correct for initialize() to do nothing. For example, some sparse LU factorizations with pivoting may combine the symbolic and numeric factorization phases into one phase.

mhoemmen commented 8 years ago

@jhux2 @csiefer2 @ambrad

jhux2 commented 8 years ago

Adding @aprokop, since this will affect reuse of smoother components across multigrid solves.

mhoemmen commented 8 years ago

I'm hoping that fixing this would let MueLu not need a special case, but not actually require MueLu to change its code.

mhoemmen commented 8 years ago

From @ambrad's #651 post:

@trilinos/ifpack2, @mhoemmen , @srajama1

RILUK's symbolic-phase information cannot be reused right now. Users must call initialize() before compute(), always. See, e.g., the logic surrounding

define IFPACK2_HAS_PROPER_REUSE

in MueLu_Ifpack2Smoother_def.hpp. This issue is focused on making RILUK have a proper symbolic/numeric/apply division of labor and so a reuse capability.

ambrad commented 8 years ago

Oops, sorry about the duplicate. I'm working on finishing up a commit now that makes substantial progress on this issue. I'll bring it in as a PR so we can review it.

mhoemmen commented 8 years ago

Oops, sorry about the duplicate.

No worries!

I'm working on finishing up a commit now that makes substantial progress on this issue. I'll bring it in as a PR so we can review it.

Awesome! Looking forward to it!

ambrad commented 8 years ago

@mhoemmen @jhux2, what do you think:

https://github.com/ambrad/Trilinos/commit/bb2bf9c234664dbb05635f9c0f3b6a9c04a2d569

I'm getting nice speedup on a 32-rank problem, both with a direct use of RILUK and and also with an indirect use through AdditiveSchwarz.

srajama1 commented 8 years ago

I left a comment on the commit. Hopefully it will get routed to @ambrad but this is just a note if it doesn't. It looks good.

ambrad commented 8 years ago

Thanks @srajama1 . What is the warning?

Btw, I'm going to check if it's worth saving A_local as a member variable. To avoid keeping memory around in the no-reuse case, I might store to Alocal only after the first reuse call. That way, single-use preconditioning's memory footprint won't change.

srajama1 commented 8 years ago

I was wondering why "A_localcrs = A_local_crs;" was there ? Whether it was for satisfying a warning. Yes, it might be good to keep the memory footprint for no-reuse.

ambrad commented 8 years ago

@srajama1 , I see. I removed the tmp variable and just use A_localcrs directly now.

I've also updated with the Alocal cached version. Performance testing shows it's a another small speedup. As we discussed, it will cache only after the first reuse, so the no-reuse case's memory footprint is unaffected. Updated commit here:

https://github.com/ambrad/Trilinos/commit/785808bcf3f8cac615822a33db3f16fac712dafc

ambrad commented 8 years ago

It occurred to me that I might be able to call initAllValues with Llocal instead of L_localcrs, obviating the copy from L_localcrs to Llocal except the first time. I think that should work immediately or with only minimal modification. I'm thinking that L_localcrs is only needed to create the ILU(k) graph. If that's true, L_localcrs should not be a member variable (but Llocal should be); its memory can be released after the graph is created. I'll try this tomorrow. Could lead to a very fast refresh of numbers.

ambrad commented 8 years ago

OK, the method I described in my previous comment works. I've redone the commit to use that approach. Now A_localcrs is no longer a member variable (except if the experimental Basker code is being used); Alocal takes its place. makeLocalFilter is called just in initialize(), initializing Alocal. initAllValues(*Alocal) is always called in compute() and never in initialize(). The key here is that there is no longer a large intermediate copy of the matrix in reuse calls, but only in the original call (as previously), and only then transiently in initialize(). In reuse calls, Alocal loads L and U directly. Don't forget Alocal is just a wrapper around A_, so it's not very large, unlike A_localcrs was. The ordering consistency check is done just once in initialize() (and not in initiAllValues()). All of this gives still a little more speedup. Here's the new commit:

https://github.com/ambrad/Trilinos/commit/b8636539e5a1e7d61b20fe2b39f2f5d22da59c42

mhoemmen commented 8 years ago

@ambrad This reminds me to link to #558; @pwxy posted some test cases that started failing when I introduced LocalSparseTriangularSolver. We need to check current status; my fix for Tpetra::Crs{Graph,Matrix}::globalAssemble may have even fixed those.

ambrad commented 8 years ago

Pushed the fix and so closing.