Number of function calls

rcontinuum commented 2 years ago

Hello, this is more me trying to understand a bit your code while having solid mechanics as an application in mind. But maybe it is a bit long to ask at Gitter? Generally, Gridap is impressive work.

The following code is a bit motivated by your hyperelasticity tutorial and tries to compute the residual vector and the Jacobian matrix for one 4-node bilinear element (QUAD4) with two dofs per node (displacements) and a 2x2 Gauss integration rule. The stress function P(), 1st Piola-Kirchhoff, and the tangent operator function Lv() are just dummy functions. The code does not try to solve a BVP. I want to figure out how often is the stress function P() and the tangent operator function Lv() called for one computation of the residual vector and the Jacobian matrix. The two global variables Pcnt and Lvcnt are uses for that purpose. Background: I used the debugger in vscode, set a breakpoint in the stress function in the hyperelastic tutorial and observed "many" function calls.

# compute residual and jacobian
using Gridap
#println("----- BEGIN -----")
Pcnt = 0
Lvcnt = 0

# deformation gradient
F(∇u) = one(∇u) + ∇u

# Piola stress (1.PK)
function P(∇u)
    global Pcnt += 1
    #println(Pcnt, " in P() : ∇u=", ∇u)
    return 2.0*F(∇u)
end

function res(u, v)
    return ∫(P ∘ ∇(u) ⊙ ∇(v)) * dΩ
end
# tangent operator
function Lv(∇du, ∇u)
    global Lvcnt += 1
    #println(Lvcnt, " in Lv() : ∇du=", ∇du, " ∇u=", ∇u)
    return 4.0*(F(∇u)⋅∇du)
end

function jac(u, du, v) 
    return ∫(Lv ∘ (∇(du), ∇(u)) ⊙ ∇(v)) * dΩ
end

# one QUAD4 element with 2x2(?) Gauss quadrature 
domain = (0.0, 1.0, 0.0, 1.0)
partition = (1, 1)
order = 1     # bi-linear QUAD4
model = CartesianDiscreteModel(domain, partition)
reffe = ReferenceFE(lagrangian, VectorValue{2, Float64}, order) 
V = TestFESpace(model, reffe, conformity=:H1) 
Ω = Triangulation(model)
degree = 2*order   # 2x2(?) Gauss quadrature
dΩ = Measure(Ω, degree)
U = TrialFESpace(V)
#FE problem
op = FEOperator(res, jac, U, V)
x0 = zeros(Float64, num_free_dofs(V))
x0[7] = 0.2     # node 4 x-dspl
x0[8] = 0.1     # node 4 y-dspl
u0 = FEFunction(U, x0)

v = get_fe_basis(V) 
du = get_trial_fe_basis(U)

println("----- RESIDUAL VECTOR -----")
rvec = Gridap.Algebra.residual(op, u0)
println("#P() function calls= ", Pcnt)
#println("rvec=", rvec)

println("----- JACOBIAN MATRIX -----")
jmat = jacobian(op, u0)
println("#Lv() function calls= ", Lvcnt)
#println("jmat=", jmat)
#println("----- END -----")

The output is (I'm using Gridap#master)

----- RESIDUAL VECTOR -----
#P() function calls= 92
----- JACOBIAN MATRIX -----
#Lv() function calls= 634

These are a lot of function calls to P() and Lv(). Am I doing something wrong and misunderstand the usage of Gridap?

I mean, a classical implementation of such an "element", would call at each Gauss point once P(), i.e. 4 function calls for a 2x2 Gauss. And at each Gauss point once a function C4(), a function which gives the 4th order tangent operator, i.e. also 4 function calls. Then one has to do "some" loops over the 4 element nodes and make some "contractions" between P, C4 and the gradients of the shape functions at the Gauss points to compute the residual vector and the Jacobian matrix. The shape function gradients (wrt. the undeformed configuration) are constant "matrices" and can be initialized once. Thus, overall we would need 4 calls to P() and 4 calls to C4(). Remark: with C4() and ∇du one can define Lv().

Think of a FE^2 scheme, where one solves a FE "microstructure subproblem" at each Gauss point. Then these many function calls to P() and Lv() would "hurt". Maybe I am doing/interpreting something wrong? Hints, comments appreciated.

Cheers

fverdugo commented 2 years ago

Hi @rcontinuum !

Gridap internally sets up caches and scratch data structures for storing intermediate results, avoiding heap allocations within the assembly loop etc. They usually call the functions in your weak form. We also do some safety checks that might also call these functions, but they can be (will be in the future) deactivated in release mode (i.e when opening julia with --check-bounds=no).

Note however that this is (or should be in theory) just an overhead, i.e., a constant number of calls independently of the number of cells in your mesh. For large meshes with thousands of cells this should be negligible. You can try to refine your mesh and see how the number of cells scales with the mesh size. In any case, I agree that the 634 calls to Lv is a bit suspicious, but the scaling tests will tell.

A part from this, one has at a disposal more room for optimizations. Gridap allows for the simultaneous integration of the jacobian and the residual (some work can be usually reused between them). One can even use lower level API calls to define custom weak form kernels instead of the ones automatically generated by Gridap. In addition one could completely bypass these extra calls in setup parts of the code.

rcontinuum commented 2 years ago

Hello @fverdugo,

I did the test as you suggested with increasing number of cells. Here is the result

 #cells      #P()    #P()/#cells       #Lv()   #Lv()/#cells
      1        92             92         634            634
    100       488           4.88        3802          38.02
  10000     40088         4.0088      320602        32.0602
1000000   4000088       4.000088    32000602      32.000602

Asymptotically Gridap needs 4 function calls to P() per cell for the residual vector. And 32 function calls to Lv() per cell for the Jacobian matrix. OK, there is some constant overhead for your cache, data structures etc. The 32 function calls to Lv() per cell seems to me a bit too much.

Gridap allows for the simultaneous integration of the jacobian and the residual (some work can be usually reused between them).

Do you have an example or hint for that? That's the usual way one implements an user material in Mechanics.

One can even use lower level API calls to define custom weak form kernels instead of the ones automatically generated by Gridap. In addition one could completely bypass these extra calls in setup parts of the code.

Well, it takes some time to dig into your code ... .

Thank you very much for your insights and have a nice weekend

@rcontinuum

fverdugo commented 2 years ago

One would expect 16 instead of 32, right?

rcontinuum commented 2 years ago

Well, to be honest I don't know the minimum number of 'Lv()' calls per cell. I use for my own toy FE-code a function which gives me the 4th order tangent operator $C4 = \frac{\partial P}{\partial F}$ (how do I do Latex formulas here?). And with that I need 4 calls to C4() for the element Jacobian matrix. One call at each Gauss point (2x2 Gauss). That's how I implement the Jacobian in FEM. To be specific, I write a constitutive function which returns the stress P and the 4th order tensor C4. And based on that one computes the (element) residual vector and (element) Jacobian matrix ("stiffness" matrix) in an element function which loops over all Gauss points of one element. and I have one call to the constitutive function per Gauss point.

rcontinuum commented 2 years ago

Hello,

one question and one remark

would you be so kind and tell me which Gridap functions are actually doing the cell-wise computations of the "cell" Jacobian and cell residual vector. I mean the 'real' 32 calls to Lv() and 4 calls to P(). That was my initial goal to find them by setting break points in P() and Lv(). But then I got a bit lost in the debugger due to these many calls.
After thinking a bit about the implementation, I don't think that with the current definition
```
jac(u, du, v) = ∫(Lv ∘ (∇(du), ∇(u)) ⊙ ∇(v)) * dΩ
```
it is possible to reduce the number of constitutive function calls to 4 per cell (in the 2x2 Gauss point case). One would need to write a specialized case (maybe you call it kernel) with
```
jac(u, du, v) = ∫( (C4∘(∇(u))) : ∇(du)) ⊙ ∇(v) ) * dΩ
```
to kind of separate constitutive calls C4∘(∇(u)) at the Gauss points from finite element "approximations" ∇(du) and ∇(v) where one need loops over the cell nodes (or dofs) . : would be a double contraction and C4() the 4th order tangent operator (function) of the constitutive model. The current implementation has the nice advantage, that it is more universal with the above observed price.

fverdugo commented 2 years ago

which Gridap functions are actually doing the cell-wise computations of the "cell" Jacobian and cell residual vector

For Lv ∘ (∇(du), ∇(u)), its provably this one:

https://github.com/gridap/Gridap.jl/blob/6d0da6ef46e7fe3445ae3260a3ecdc821201c87f/src/Fields/FieldArrays.jl#L626

∇(du) leads to a 3-dim array of size (nip,1,nldofs) at each cell (argument b in the linked function), nip and nldofs, being the number of integration points and local dofs in that cell respectovelly. ∇(u) leads to a vector of size (nip,) (argument a in the linked function). Function Lv will be stored in f.op in the linked funcion.

C4∘(∇(u))

In this version, you would need to call function C4 one time per integration point and cell. It will be provably called in this function

https://github.com/gridap/Gridap.jl/blob/6d0da6ef46e7fe3445ae3260a3ecdc821201c87f/src/Arrays/Maps.jl#L166

fverdugo commented 2 years ago

BTW, if you want to learn more about the Gridap internals, you have these resources:

I think that reading them might be useful in order to look at the sorce code.

rcontinuum commented 2 years ago

I had a look at Tutorial 13, but at a certain point I started digging in your code ... and got lost a bit.

Thank you for your hints! I'll close this issue.

gridap / Gridap.jl

Number of function calls #702