An idea to make MOOSE applications run 10 times faster

YaqiWang commented 7 years ago

Description of the enhancement or error report

Our current residual and Jacobian evaluation is significantly slower (maybe 10 times) than a hand-coded evaluation mainly due to the utilization of numerical quadratures. It requires us to evaluate the shape functions, their gradients, variable values, JxW, etc. on every single quadrature points. While this provide ultimate flexibility for multiphysics simulations. There are cases where all of these evaluations are unnecessary, such as 2D first-order triangular meshes with constant properties over elements. In such a case the reference mass matrix is constant, stiffness matrix is a constant function of element aspect ratios. The residual evaluation for Diffusion kernel is simply a matrix-vector product. It also possible, instead of doing multiphysics coupling on quadrature point bases, we can just do the coupling on element basis, i.e. manually make properties couple variable averaged values.

So I propose to implement a new type of kernel, possibly with the name being ElementKernel, which inherits KernelBase:

move all quadrature related members in KernelBase into Kernel;
Disable all quadrature point related functions like coupledValue in ElementKernel and only allow something like coupledNodal (which obtain the shape function coefficients), coupledAveragedValue (which provide averaged variable value), etc.;
Implement ElementKernel::initialSetup(), where necessary pieces are evaluated like the reference mass matrix, coefficient matrices for the stiffness matrix, etc.
Change Assembly, MooseVariable to skip most of evaluations when the element only has ElementKernels and Materials without coupling variables on quadrature points.

As an example of Reaction to ElementReaction, the code in ElementReaction::computeResidual would be

massMatrix() * _nodal_u;

Or similarly for ElementDiffusion

stiffnessMatrix() * _nodal_u;

Or ElementKernel can provide a further optimized functions like

DenseVector<Real> massProduct(const DenseVector<Real> & u);
DenseVector<Real> stiffnessProduct(const DenseVector<Real> & u);

We can add more pieces for example

DenseVector<Real> convectionProduct(const DenseVector<Real> & u);

etc.

Rationale for the enhancement or information for reproducing the error

This can potentially make simulations run 10 times faster. It is typically allowed to reduce the accuracy of coupling. The condition can be satisfied in single physics calculations. To make this fully functioning, we may need to do the same for BoundaryCondition, DGKernel, AuxKernel. Or we can provide a ElementInterface. We may need to have coupledBoundary (to get the coefficients on boundary element).

Identified impact

(i.e. Internal object changes, limited interface changes, public API change, or a list of specific applications impacted) It will be a new system so have not impact on our current capabilities. Nobody will complain our application is slow ever.

YaqiWang commented 7 years ago

Well I agree this is revolutionary. But is it worth to try?

friedmud commented 7 years ago

As you might suspect: I'm against this. Way too specialized for the framework. It also duplicates entirely too much functionality... adding thousands of lines of code that will be rarely used.

I suppose I didn't even need to post :) On Wed, Apr 26, 2017 at 9:11 PM Yaqi notifications@github.com wrote:

Well I agree this is revolutionary. But is it worth to try?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/idaholab/moose/issues/9014#issuecomment-297583605, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1JMdwmJXS72nS-HAJ27Ps_bteDidmDks5rz-s3gaJpZM4NJmbO .

andrsd commented 7 years ago

@YaqiWang Why do you think the speedup will be ~10x? Did you try to implement this or is that just a SWAG?

permcody commented 7 years ago

It is time to start considering framework level changes for optimization. However, this particular design will require an enormous amount of code duplicate to be feasible so this isn't the right idea. We can do better but it might require help on the libMesh side too. If you could prove a 10x speedup for specific cases like the one you laid out, we could detect that case and run a different code path without changing all of the objects on top. I don't really want an explosion of special cases though so we'd have to be smart about this.

I honestly think we need to continue shaking out the distributed mesh stuff in Rattlesnake first, then we'll start looking at some of these other optimizations. Never hurts to start the discussion.

@rwcarlsen is already working on profiling and finding ways to speedup the framework as a whole.

rwcarlsen commented 7 years ago

I've taken a detailed look at a fair number of simulation profiles (BISON, module tests, etc.) and I'm a bit skeptical about 10X speedup. I'd be very interested in any demonstration of such a drastic improvement.

YaqiWang commented 7 years ago

So far it is a SWAG, hehe. I need a numerical proof with a simple case. Hopefully I can provide it soon.

YaqiWang commented 7 years ago

For a simple 2D diffusion-reaction calculation with one variable, the speedup on residual and jacobian evaluation is above factor 3. If number of variables is large or in 3D, I expect the gain to be more. 10 times faster is ideal ;-) Solver time is also not counted though. But you get my idea. Actually those small functions can be pushed into libMesh::Elem, then the change in MOOSE is minor. We do need to implement those small functions for each type of elements.

rwcarlsen commented 7 years ago

I think a better, more scalable (to more use cases) approach - also in the category of big framework changes - would be a dependency value "pull" approach where, when a value is needed at a particular mesh location, a function is called to calculate/retrieve it (possibly from some sort of cache if it was already retrieved once before). This would work for materials, and most other things. Then we would automatically get super fine-grained only-calculate-what-you-need functionality.

friedmud commented 7 years ago

On Mon, May 1, 2017 at 10:40 AM Robert Carlsen notifications@github.com wrote:

I think a better, more scalable (to more use cases) approach - also in the category of big framework changes - would be a dependency value "pull" approach where, when a value is needed at a particular mesh location, a function is called to calculate/retrieve it (possibly from some sort of cache if it was already retrieved once before). This would work for materials, and most other things. Then we would automatically get super fine-grained only-calculate-what-you-need functionality.

At the cost of greatly increasing and complicating the code users have to write.

We CANNOT fall into the trap of trying to make everyone happy with MOOSE. MOOSE has been successful (and will continue to be) because we enable a certain class of users with powerful, straightforward code. No amount of added code burden on those users is worth (possibly) speeding up one user's hyper-specific use-case.

I've been down this road before. SIERRA at Sandia fell into this trap. They tried to optimize for so many different kinds of physics (Implicit FE, FV, Explicit, Meshless and more) that they ended up with a HUGE codebase with an incredible amount of duplication of effort, verbose code and large issues trying to enable advanced features like error indication, mesh adaptivity and code coupling.

Let's keep MOOSE focused on what it does best (and make it do THAT better as we go along).

friedmud commented 6 years ago

This idea is really not happening - but we are growing more capability for specialized solves as we go. For instance, see the new ActuallyExplicitEuler as an example. We now have the ability to compute a "mass matrix" and a "stiffness" matrix and manually call linear solvers...

idaholab / moose