idaholab / moose

Multiphysics Object Oriented Simulation Environment
https://www.mooseframework.org
GNU Lesser General Public License v2.1
1.76k stars 1.05k forks source link

Preallocation of matrix memory #7901

Open fdkong opened 8 years ago

fdkong commented 8 years ago

Description of the enhancement or error report

Preallocation of matrix memory is crucial for good performance for large problems. There are more details at https://www.mcs.anl.gov/petsc/documentation/faq.html

There are two issues currently in MOOSE.

(1) The first one is minor. At libmesh level, extra memory is allocated by default. It is minor because it can be easily fixed by turning on the flagneed_full_sparsity_pattern in dof_map.C. A fix will help us save memory.

Number of cores Nonzeros Allocated nonzeros Extra allocated nonzeros total compute time
1 580879 580879 0% 1.3240e+01
2 580879 584701 0.6% 9.1879e+00
4 580879 590824 1.7% 7.1261e+00
8 580879 597389 2.8% 3.9194e+00
16 580879 605107 4.1% 3.8965e+00
32 580879 616517 6.1% 6.2853e+00
61 580879 631003 8.6% 1.1507e+01
No Extra Memory Allocated -------------
1 580879 580879 0% 1.2837e+01
2 580879 580879 0% 9.1238e+00
4 580879 580879 0% 5.6990e+00
8 580879 580879 0% 3.8375e+00
16 580879 580879 0% 4.1732e+00
32 580879 580879 0% 6.2350e+00
61 580879 580879 0% 1.1965e+01

This data could be reproduced using a standard example at moose/examples/ex01_inputfile.

We can clearly see that the extra memory is increased linearly when increasing the number of cores. The reason to allocate extra memory in libmesh is to save the compute time on computing the nonzero pattern. The compute time on computing nonzero pattern takes a small portion of the total compute time. We so do NOT, usually, see much gain we get by using this method in terms of the total compute time ( comparing the last column).

My proposal is to give users an optional option on this.

(2) The second is a serious issue because it kills the scalability. The preallocaiton is incorrect for many applications and users do not even know the thing is going wrong. Libmesh computes a sparsity pattern based on the connectivity of the graph. Applications could have extra nonzeros for some reasons. For example, contact mechanics problems have one extra layer connectivity at the contact surface.

It is REALLY expensive for PETSc to frequently reallocate memory for inserting extra nonzero entries.

Number of cores Total compute time MatAssembly Proportion
1 2.6444e+02 2.1471e-01 0%
2 1.5932e+02 7.3836e+00 4.6%
4 1.1719e+02 2.1318e+01 18%
8 9.3948e+01 3.5704e+01 38%

The time spent on MatAssembly is expoentially increased because the preallocation is incrrect.

I am going to try to fix this issue for contact mechanics problems, but I have no idea right now for other applications.

We should enable preallocation error messages by default so that users know they are doing something wrong. MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE)

fdkong commented 8 years ago

@permcody, @friedmud, @jwpeterson, @bwspenc, any comments?

bwspenc commented 8 years ago

I agree that we need to straighten things out, and it sounds like we should be able to see some big performance gains when we get this right for contact problems, which I'm excited about (have you figured out what the problem is yet?). We can't turn on the preallocation error messages by default until we can run all of our test suite problems without hitting that error, though, which I imagine will be no small task.

jwpeterson commented 8 years ago

@fdkong thanks for getting these numbers. It is interesting that the over-allocating case runs a bit more quickly, so there is definitely a trade-off present in the two approaches. As we discussed, I think this should be an (advanced) option which users can select, probably on the [Problem] block.

As for your second issue, I believe this is related to @roystgnr's ghosting functor work... it is a way of telling libmesh that certain dofs are coupled when they otherwise would not be, and this information should be used in creating the sparsity pattern.

fdkong commented 8 years ago

I agree that we need to straighten things out, and it sounds like we should be able to see some big performance gains when we get this right for contact problems, which I'm excited about (have you figured out what the problem is yet?).

I did not figure out yet, and am still working on. I think for the second issue, libmesh works fine. There is something wrong in the function NonlinearSystem::augmentSparsity at the MOOSE side.

We can't turn on the preallocation error messages by default until we can run all of our test suite problems without hitting that error, though, which I imagine will be no small task.

This is a huge task. I am not sure if we are going to fix all problems, @permcody? My first goal is to fix this issue for contact mechanics problems because I know something about this application and I want to make fieldSplit preconditioner work.

friedmud commented 8 years ago

I will think this should be put on hold until someone wants to do it using the GhostingFunctor stuff.

That is the right way to do it. @roystognr even built in special stuff for dealing with the fact that not every variable on ghosted dogs is coupled to every other one (something our stuff has always ignored). On Wed, Oct 19, 2016 at 3:50 PM Fande Kong notifications@github.com wrote:

I agree that we need to straighten things out, and it sounds like we should be able to see some big performance gains when we get this right for contact problems, which I'm excited about (have you figured out what the problem is yet?).

I did not figure out yet, and am still working on. I think for the second issue, libmesh works fine. There is something wrong in the function NonlinearSystem::augmentSparsity at the MOOSE side.

We can't turn on the preallocation error messages by default until we can run all of our test suite problems without hitting that error, though, which I imagine will be no small task.

This is a huge task. I am not sure if we are going to fix all problems, @permcody https://github.com/permcody? My first goal is to fix this issue for contact mechanics problems because I know something about this application and I want to make fieldSplit preconditioner work.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/idaholab/moose/issues/7901#issuecomment-254921118, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1JMWOsa2IRoPZcmDgW7fzjacFBVJBpks5q1nRvgaJpZM4KbEzq .

friedmud commented 8 years ago

BTW: need_full_sparsity_pattern will actually use far MORE RAM!

The SparsityPattern object itself takes a pretty huge amount of memory (and a lot of time) to build.

Rerun your first stuff on a large problem and use the memory_logger and you'll see the big spike in the beginning when the SparsityPattern object is built.

The over-allocation is preferable to that. That memory spike could cause your problem to die in memory constrained cases.

fdkong commented 8 years ago

I will think this should be put on hold until someone wants to do it using the GhostingFunctor stuff.

For two issues, right?

I think we still possibly could resolve the second issue without using GhostingFunctor unless the geometric search is rewritten using GhostingFunctor.

That is the right way to do it. @roystognr even built in special stuff for dealing with the fact that not every variable on ghosted dogs is coupled to every other one (something our stuff has always ignored).

friedmud commented 8 years ago

@fdkong Yes... the gometric search system will feed the GhostingFunctors the right information. Then the sparsity pattern should be based on the GhostingFunctors.

friedmud commented 8 years ago

@fdkong did you delete a message or something? I got an email about another message but I don't see it.

It looks like you were questioning about why setting need_full_sparsity_pattern will result in increased memory and a slowdown:

  1. The "slowdown" is just that it takes time to build the full sparsity pattern. That can be quite a LOT of time on very large problems.
  2. The memory increase is due to the creation of the SparsityPattern object. When need_full_sparsity_pattern = true a SparsityPattern object gets built and completely filled with entries (which takes memory and time). With need_full_sparsity_pattern = false that's not the case... only the number of nonzeros are tallied up. See: https://github.com/libMesh/libmesh/blob/master/src/base/sparsity_pattern.C
friedmud commented 8 years ago

In particular: check this: https://github.com/libMesh/libmesh/blob/master/src/base/sparsity_pattern.C#L346

fdkong commented 8 years ago

@fdkong Yes... the gometric search system will feed the GhostingFunctors the right information. Then the sparsity pattern should be based on the GhostingFunctors.

@friedmud, Are GhostingFunctors under development? Or something already mature?

fdkong commented 8 years ago

It looks like you were questioning about why setting need_full_sparsity_pattern will result in increased memory and a slowdown:

The "slowdown" is just that it takes time to build the full sparsity pattern. That can be quite a LOT of time on very large problems.

The memory increase is due to the creation of the SparsityPattern object. When need_full_sparsity_pattern = true a SparsityPattern object gets built and completely filled with entries (which takes memory and time).
With need_full_sparsity_pattern = false that's not the case... only the number of nonzeros are tallied up.
See: https://github.com/libMesh/libmesh/blob/master/src/base/sparsity_pattern.C

@friedmud, I understand the concern here. SparsityPattern takes as much memory as a matrix. But I do not need SparsityPattern because PETSc does not need it. What PETSc requires is how many nonzeros we have on each row, not the exact locations.

I possibly like to see the third option. What I need is two correct vectors: std::vector<dof_id_type> & n_nz std::vector<dof_id_type> & n_oz without over-allocation and without creating the matrix SparsityPattern. The two vectors give me over-allocation without using need_full_sparsity_pattern = true.

permcody commented 8 years ago

GhostingFunctors were started at this year's tiger team. They aren't even available in our version of libMesh yet.

permcody commented 8 years ago

I plan to meet with @friedmud in Boston at the beginning of November to start hammering out a design for an interface to these new GhostingFunctors. There are several applications for them including this one. We just have to decide what that interface should look like to a MOOSE user.

fdkong commented 8 years ago

I plan to meet with @friedmud in Boston at the beginning of November to start hammering out a design for an interface to these new GhostingFunctors. There are several applications for them including this one. We just have to decide what that interface should look like to a MOOSE user.

@permcody, may I just go ahead to fix prealloaction issue for contact mechanics problems? Or I have to wait for your new interfaces.

I do not think it is necessary to wait. It should be easy migrate things from the old one to this new interface once the new interface is ready to go.

permcody commented 8 years ago

If you can fix it without the new interface, great! Don't let this hold you back.

On Thu, Oct 20, 2016 at 10:06 AM Fande Kong notifications@github.com wrote:

I plan to meet with @friedmud https://github.com/friedmud in Boston at the beginning of November to start hammering out a design for an interface to these new GhostingFunctors. There are several applications for them including this one. We just have to decide what that interface should look like to a MOOSE user.

@permcody https://github.com/permcody, may I just go ahead to fix prealloaction issue for contact mechanics problems? Or I have to wait for your new interfaces.

I do not think it is necessary to wait. It should be easy migrate things from the old one to this new interface once the new interface is ready to go.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/idaholab/moose/issues/7901#issuecomment-255133500, or mute the thread https://github.com/notifications/unsubscribe-auth/AC5XICurlYnjW-sDpQ03qjfc8SR_Mw7tks5q14N6gaJpZM4KbEzq .

lindsayad commented 3 years ago

I'm adding this to the Technical Assistance project, because I think that this is something that can "very quietly" (as @fdkong said there is nothing telling the user if they have new mallocs) affect applications in a big way as @fdkong illustrated in his timings. It would be awesome to get to a point where we turned on nonzero reallocation error in the framework library, but we probably don't have time to do that without more directed funding. I think a good shorter time thing is to add an application level option to turn on nonzero reallocation errors, so that those application developer/users can be confident that there code is running as quickly as possible with respect to memory allocation (e.g. no reallocation). This has actually been a real issue with my work on global AD indexing. For example I had this situation:

Global AD indexing only adds into the Jacobian during kernel evaluation if there is a non-zero dependence. Then we close/assemble the matrix before we evaluate nodal bcs; when we assemble, the matrix shrinks, eliminating all the unused off-diagional memory. Then I had a naive NodalBC::offDiagJacobian trying to set matrix values for all jvars in the coupling matrix even if its value was zero, which led to new nonzero mallocs. I only caught this by fairly random chance. I was testing BISON with the new global AD indexing and ran into a very small exodiff. I tried turning on the new nonzero reallocation error and voila it revealed this situation.

lindsayad commented 1 year ago

@recuero this will be the primary issue I reference for future work on dynamic sparsity patterns

gambka commented 11 months ago

I was pointed to this issue based upon behavior I'm seeing with a BISON problem I'm trying to run. The problem has 402120 nonlinear dofs of both SECOND LAGRANGE and FIRST SCALAR and I'm running on 8 procs. I've ran two versions of the problem, one with thermal and mechanical contact (non-mortar), and one without. In the case with contact the very first Computing Jacobian is as follows:

Time Step 1, time = 1, dt = 1
 0 Nonlinear |R| = 1.317569e+09

    Computing Jacobian....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... [7205.27 s] [  733 MB]

However, if I remove thermal and mechanical contact from the problem I get the following:

Time Step 1, time = 1, dt = 1
 0 Nonlinear |R| = 1.317569e+09
Currently Executing
    Computing Jacobian                                                                   [ 10.21 s] [  265 MB]

All other Jacobian computations for other nonlinear iterations in both problems take ~10 seconds after the initial. That time spent in the initial Computing Jacobian is a significant portion of the computational time in the problem with thermal and mechanical contact.

lindsayad commented 11 months ago

I suspect you would run into this issue with mortar as well. However, it would not be in the initial Jacobian evaluation. I had to look twice to verify that you were not running with mortar.

This is a Technical Area support issue that we plan to tackle this year. There is an accompanying PETSc issue at https://gitlab.com/petsc/petsc/-/issues/852 that I created which discusses need of contact

gambka commented 11 months ago

I suspect you would run into this issue with mortar as well. However, it would not be in the initial Jacobian evaluation. I had to look twice to verify that you were not running with mortar.

The geometric representation I'm using cannot use mortar. Thank you for looking into the issue this year.

lindsayad commented 2 months ago

Looking into the skipped test at https://github.inl.gov/ncrc/bison/pull/6072, we have the sparsity pattern wrong even before we apply constraints for the very first Jacobian evaluation. Rough. What's interesting is that I have to run with at least two procs to see the new nonzero error. In serial I do not see it