ArrayKernel and ArrayBoundaryCondition

YaqiWang commented 8 years ago

Description of the enhancement or error report

Currently Kernel or BoundaryCondition are designed for an individual variable which is indicated by their variable parameter. In radiation transport, the extra independent variables of energy and streaming direction can make the number of variables quite large (potentially above 100K), which will results into too many kernels and boundary conditions being added. These kernels and bcs could contribute a huge overhead of memory consumption. If we can make a kernel and a bc operate on a vector of variables simultaneously, we can reduce the number of them substantially.

Rationale for the enhancement or information for reproducing the error

This is needed for simulations with huge number of variables.

Identified impact

(i.e. Internal object changes, limited interface changes, public API change, or a list of specific applications impacted) Radiation transport can benefit from this capability. Few questions I have in mind at this moment:

Should this VectorKernel be an independent system or derived from Kernel? Similarly for VectorBoundaryCondition.
What else implications we need to consider for simulations with huge number of variables? For instance, should variable itself be vectored? Will Jacobian assembly be an issue?

lw4992 commented 8 years ago

Action can give a solution for this issue, if native moose support would be better. #3719

permcody commented 8 years ago

I'm not sure if I fully understood what you were after for #3719. You mentioned in your original description about duplicate code but that was never an issue. Now it is true that without the Action system, you could have a lot of repetition in your input file.

This particular PR is for creating a single MooseObject that works on a whole array of different variables. This situation shows up in neutronics a lot and chemical reaction networks.

YaqiWang commented 8 years ago

I am here talking about kernels more than 100k. If your number of kernels is less than that and bigger than the number with which your input becomes too tedious and error prone with moose simple input syntax, action is the way to go.

friedmud commented 8 years ago

I implemented something like this for my recent MOC work. My MOCKernel objects apply to all energy groups simultaneously. I capitalize on the "variable groups" capability in libMesh to only do global->local mapping for the first variable in a variable group... then I do direct indexing into the PETSc vector using that local dof + a group offset (going directly to the C array from VecGetArray()).

It's all insanely fast... and I would love to get something like it into MOOSE for normal Kernels (and enable things like VectorKernels). Getting the interface correct is the hard part.

This is the perfect thing to work on during the tiger team. I can show you what I've done and we can hammer out an API and get this implemented.

YaqiWang commented 8 years ago

When will be our next tiger team? I cannot wait to have this done. Once this is ready, I will need some time to convert my SN kernels and try to run problems otherwise with million kernels.

friedmud commented 8 years ago

@yaqiwang I still don't understand "millions" of Kernels. You should only have like 100 groups times 128 angles times maybe 5 or so... Which is like 60,000.

How many angles / groups are you trying to run?

YaqiWang commented 8 years ago

Could be 300 hundred of groups, each with 300 angles, with 10 kernels on average for each variable, so the total is close to 1M. Here the numbers of groups and angles could be a little conservative.

YaqiWang commented 8 years ago

@friedmud I'd like to work on this because we want demonstrate the capability of solving problems with large number of groups. Can you point to me where I can look at to start this?

permcody commented 8 years ago

@jwpeterson is planning on working this. Let's chat about it tomorrow. On Wed, Jun 15, 2016 at 5:06 PM Yaqi notifications@github.com wrote:

@friedmud https://github.com/friedmud I'd like to work on this because we want demonstrate the capability of solving problems with large number of groups. Can you point to me where I can look at to start this?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/idaholab/moose/issues/6881#issuecomment-226345833, or mute the thread https://github.com/notifications/unsubscribe/AC5XIDR7KWOY7KCUyB4wCse4ZL1ZUIyrks5qMIVrgaJpZM4ITrBp .

YaqiWang commented 8 years ago

That is fantastic! Feel free to grab me for the chat.

friedmud commented 8 years ago

I want to table this work until the tiger team. This needs to be designed... and then redesigned.

I would like to show you guys what I'm currently doing and have some discussion.

Maybe we need a label for issues we want to work on during the tiger team? On Thu, Jun 16, 2016 at 11:08 AM Yaqi notifications@github.com wrote:

That is fantastic! Feel free to grab me for the chat.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/idaholab/moose/issues/6881#issuecomment-226514474, or mute the thread https://github.com/notifications/unsubscribe/AA1JMabvdEk97jIBo14neDb_FA8t8Ds0ks5qMWbdgaJpZM4ITrBp .

friedmud commented 8 years ago

Let me note something here: this capability is quite distinct from supporting "vector valued" finite-elements. You would think they're similar... but they're really not. vector valued finite-elements are quite a lot more complicated.

The capability I'm envisioning here is for applying to many variables of the same type at the same time.

friedmud commented 8 years ago

Here's the way it works in my MOCKernels (more or less, with a bit of paraphrasing and leaving out details that aren't relevant here):

A full residual vector is allocated for each thread
Before the residual is computed the "local form" of the PetscVector is cached for each thread
On each element Problem::reinit(Elem *) computes the first index of each variable group (I call it an offset)
MOCKernels loop over the number of variables and contribute directly to the residual by doing: _residual_cache[group_offset + group_var_num] += stuff. Where group_var_num is the position the variable has within the variable group.
After the element loop the thread's copies of the residual vector are summed into the true residual vector.

I think that with some tweaking this model could work well for VectorKernels as well.

lindsayad commented 6 years ago

There was a fair bit of chatter on this topic over the holidays. I know we've talked about this being dead; is it mostly because the Jacobian assembly is difficult? I guess we don't really have any MOOSE charge numbers that fit this topic except maybe @permcody's LDRD which is being used for mortar, etc.? Any other funding?

permcody commented 6 years ago

Let's chat about this tomorrow when Derek returns. We'll see if there's a way to pay for it if we decide it's feasible. We probably won't have a clear picture of the budget until the federal budget gets passed.

YaqiWang commented 6 years ago

You can talk with Mark. We are interested in this.

YaqiWang commented 6 years ago

I was thinking that we can borrow something from the scalar variable, whose order is actually the number of components. We can enhance a field variable with something like degree on top of family and order to support multiple components.

idaholab / moose