trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.21k stars 564 forks source link

problem with Ifpack2::AdditiveSchwarz and sparse direct subdomain solves #460

Closed jhux2 closed 7 years ago

jhux2 commented 8 years ago

A user has reported that Ifpack2's additive Schwarz with sparse direct solves from Amesos2 does not work properly as a smoother in MueLu. After an initial look, I discovered that the Ifpack2 test that tests this was accidentally disabled (I don't know when). I enabled it locally and found the test fails.

brian-kelley commented 7 years ago

@mhoemmen It would be tricky because that matrix file is 24 MB. That is the ascii matrix market format though, so there's definitely potential for storing it more efficiently in some binary format...

brian-kelley commented 7 years ago

@dridzal 1: Yes, because I had reordering completely off. I'll see what happens if I turn it on - then it's always RCM reordering by default I think. 2: The way I had it configured, setting overlap=0 doesn't seem to have changed memory at all... Edit: enabling RCM didn't cause a change either. Edit 2: Nope, that's wrong, the additive schwarz parameters aren't getting read at all. So what I had before is actually overlap=0 and no reordering. 3: Yes, that was this issue and it's fixed now.

jhux commented 7 years ago

Please remove me from this email distribution list

From: brian-kelley [mailto:notifications@github.com] Sent: Friday, June 16, 2017 3:24 PM To: trilinos/Trilinos Trilinos@noreply.github.com Cc: Hux, Joshua R joshua.hux@cgi.com; Mention mention@noreply.github.com Subject: Re: [trilinos/Trilinos] problem with Ifpack2::AdditiveSchwarz and sparse direct subdomain solves (#460)

Alright, so I did fix the SparseDirectSolver test completely, but I wanted to get your input before checking it in. (See my fork, branch SchwarzFix460https://github.com/brian-kelley/Trilinos/commits/SchwarzFix460).

There's definitely some weirdness with the way Amesos2 validates solver input. I had create new Tpetra::Maps that are "locally replicated" and have the same number of global rows as the real B/Y have local rows, then make a shallow copy into new multivectors (one map and one new MV per process). Then I also had to run the solver one column at a time - passing the whole MV at once still didn't work. All of this is done when I see that the "subdomain solver name" parameter is "AMESOS2".

So I have a fix but it's pretty kludgy and there's probably a better way.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/trilinos/Trilinos/issues/460#issuecomment-309125441, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AI48NfaMZon5vAXfz9ko4NtY2Jv8ic6hks5sEuRZgaJpZM4I9TUT.

mhoemmen commented 7 years ago

I just removed the @ - mention that mentioned your name. You can unsubscribe from individual issues if you like. If you don't know how to do this, we'll be happy to point you to resources.

mhoemmen commented 7 years ago

@brian-kelley wrote:

It would be tricky because that matrix file is 24 MB.

Can @dridzal write us a test that generates the matrix on the fly? He should have an interest in a performance regression test.

mhoemmen commented 7 years ago

I've heard other folks complain about RCM not working....

dridzal commented 7 years ago

I will provide a challenging test in the next week or so. The test will generate the matrix on the fly, and it will be possible to vary its size by changing the input mesh. I'll just need some help from you with accessing the performance statistics (memory use). The test will be hosted in ROL, in a rol/example/PDE-OPT subdirectory.

brian-kelley commented 7 years ago

@dridzal Measuring memory (linux/mac - I don't know about win32, cygwin or mingw):

#ifdef __APPLE__
#include <malloc/malloc.h>
#else
#include <malloc.h>
#endif

and then

inline size_t getCurrentAlloc()
{
#ifdef __APPLE__
    return mstats().bytes_used;
#else
    return mallinfo().uordblks;
#endif
}

to get the total currently allocated bytes. I measured this at the beginning and end of the sections and then took the difference.

brian-kelley commented 7 years ago

@dridzal I wanted to run one more test to get the full picture. Total memory for AdditiveSchwarz::compute() (MB):

Processes RCM overlap 0 RCM overlap 1 no reorder overlap 0 no reorder overlap 1
1 8.71 8.71 7.70 7.70
2 11.69 14.29 11.62 14.30
4 12.36 10.16 12.34 10.17
8 7.75 27.70 7.71 27.68
srajama1 commented 7 years ago

@brian-kelley : If memory is the important thing, RCM is not great ordering for direct solvers. Depending on which solver you use in Trilinos, you should let its native ordering take over.