Open dschwen opened 2 years ago
As @dschwen notes I strongly support this. I think it could significantly speed up some of our "complicated material" models, where the stress update is actually a significant portion of the overall simulation time.
I put up a basic prototype in hugary1995/moose/batch_material, with which we can test the performance of vectorized material.
I was planning to work on it this week. I'll test my phase field use case.
It's not quite ready yet. I still need to implement a parallel gather in finalize. It'll only work in serial until I add that.
Would be nice if it generic programming could be used to avoid passing these type strings in
Okay, this should be usable now.
Basically, replace the original material
# Regular version
# [Materials]
# [compute_stress]
# type = ComputeLagrangianLinearElasticStress
# large_kinematics = true
# []
# []
with the vectorized version
# Vectorized version
[UserObjects]
[compute_stress_vectorized]
type = VectorizedMaterialFake
material = compute_stress
execute_on = 'INITIAL LINEAR'
[]
[]
[Materials]
[compute_stress]
type = ComputeLagrangianLinearElasticStressVectorized
large_kinematics = true
vectorized_material = compute_stress_vectorized
[]
[]
In this example, faked GPU calls are made in VectorizedMaterialFake
. Then ComputeLagrangianLinearElasticStressVectorized
is essentially the same class as ComputeLagrangianLinearElasticStress
but all actual computations are out-sourced to VectorizedMaterialFake
which utilizes GPUs.
Have fun testing guys.
I made a few strong assumptions in this prototype:
_mesh.maxElemId()
.Real
, RankTwoTensor
, or RankFourTensor
. Although extending the support should be trivial.meshChanged()
.
Reason
Expensive material models provided by external codes make use of vectorized evaluation on accelerator devices (e.g. GPUs). For these models to operate with optimal efficiency we need to provide large batches of quadrature point data in one go (as opposed to consecutive calls of the material model for each quadrature point).
Design
An initial proposed design would be an
ElementUserObject
to gather required data for all quadrature points. This UO would then call the external vectorized property computation duringfinalize
. The resulting data would be made available to a proxy material class that would provide a MOOSE way of accessing the computed properties.Impact
Added capability. This feature has been requested by @reverendbedford for a future vectorization of NEML (or rather its successor library) as well as my me for fast evaluation of neural network based thermodynamic free energies (potentially using the now available torch library, or some custom CUDA code).