Closed colejust closed 4 years ago
Modify soil column input file Will is currently working on to include an interface element and a structure.
I created a branch with the larger problem from a few months ago, the input files for the smaller problem should also be added to it.
So, I started getting into the timing of the NLSSI problem uploaded by @hoffwm.
The time spend in the solve is good (99.4%), with 22.7% on nonlinear and 69.1% in linear iterations.
Time Step 100, time = 0.125
dt = 0.00125
0 Nonlinear |R| = 6.140825e+04
0 Linear |R| = 6.140825e+04
1 Linear |R| = 6.759531e+02
1 Nonlinear |R| = 6.559596e+02
0 Linear |R| = 6.559596e+02
1 Linear |R| = 4.038713e+02
2 Linear |R| = 2.155511e+02
3 Linear |R| = 1.288639e+02
4 Linear |R| = 6.468141e+01
2 Nonlinear |R| = 6.604958e+01
0 Linear |R| = 6.604958e+01
1 Linear |R| = 4.500460e+01
2 Linear |R| = 2.617582e+01
3 Linear |R| = 1.528866e+01
4 Linear |R| = 9.298922e+00
5 Linear |R| = 4.773156e+00
6 Linear |R| = 1.859353e+00
7 Linear |R| = 8.645449e-01
3 Nonlinear |R| = 8.645454e-01
0 Linear |R| = 8.645454e-01
1 Linear |R| = 5.168710e-01
2 Linear |R| = 3.002107e-01
3 Linear |R| = 1.837310e-01
4 Linear |R| = 1.032151e-01
5 Linear |R| = 5.765629e-02
6 Linear |R| = 3.415646e-02
7 Linear |R| = 1.980174e-02
8 Linear |R| = 8.957552e-03
9 Linear |R| = 4.954015e-03
10 Linear |R| = 2.382404e-03
11 Linear |R| = 9.726401e-04
12 Linear |R| = 5.378149e-04
4 Nonlinear |R| = 5.378145e-04
Solve Converged!
If I use an FDP the solve does the following, as expected 95% of the time is in the nonlinear iteration.
Time Step 100, time = 0.125
dt = 0.00125
0 Nonlinear |R| = 6.140825e+04
0 Linear |R| = 6.140825e+04
1 Linear |R| = 2.720737e-03
1 Nonlinear |R| = 5.269268e+03
0 Linear |R| = 5.269268e+03
1 Linear |R| = 1.980607e-06
2 Nonlinear |R| = 6.140830e+02
0 Linear |R| = 6.140830e+02
1 Linear |R| = 3.646615e-05
3 Nonlinear |R| = 5.260277e+01
0 Linear |R| = 5.260277e+01
1 Linear |R| = 2.767232e-07
4 Nonlinear |R| = 1.600747e-01
0 Linear |R| = 1.600747e-01
1 Linear |R| = 5.984129e-11
5 Nonlinear |R| = 7.408599e-04
0 Linear |R| = 7.408599e-04
1 Linear |R| = 4.651665e-17
6 Nonlinear |R| = 1.694805e-05
Solve Converged!
What this tells me is that we need to implement the full Jacobians if possible, doing so could provide a significant speed up.
Time Step 30, time = 0.0375
dt = 0.00125
0 Nonlinear |R| = 2.966040e+05
0 Linear |R| = 2.966040e+05
1 Linear |R| = 9.669859e+04
2 Linear |R| = 1.116926e+04
3 Linear |R| = 2.433818e+03
4 Linear |R| = 1.833252e+02
5 Linear |R| = 4.208464e+01
6 Linear |R| = 7.544688e+00
7 Linear |R| = 1.649546e+00
1 Nonlinear |R| = 2.860750e+03
0 Linear |R| = 2.860750e+03
1 Linear |R| = 8.032549e-01
2 Linear |R| = 1.013621e-04
2 Nonlinear |R| = 3.273228e+02
0 Linear |R| = 3.273228e+02
1 Linear |R| = 9.923694e-05
3 Nonlinear |R| = 8.082138e+00
0 Linear |R| = 8.082138e+00
1 Linear |R| = 6.466680e-08
4 Nonlinear |R| = 1.023385e-07
Solve Converged!
I tried two different Jacobian matrices - continuum jacobian matrix obtained just from theory for elastic-perfectly plastic curves, and also the consistent jacobian matrix that takes into account the iterative numerical procedure. Both these formulations are from "Computational Inelasticity" by J.C. Simo and T.J.R. Hughes. (1998). I dint see much improvement in the performance. When I compared both these matrices with that obtained by FD, there is still some difference. So I am not sure if something else needs to be accounted for to get the exact Jacobian for nonlinear problems.
I also checked the Jacobian implemented by Andy Wilkins in TensorMechanicsPlasticJ2 and that also does not exactly match with the FD matrix.
On Mon, Nov 27, 2017 at 4:43 PM, Andrew E Slaughter < notifications@github.com> wrote:
Solve with FDP and PJFNK
Time Step 30, time = 0.0375 dt = 0.00125 0 Nonlinear |R| = 2.966040e+05 0 Linear |R| = 2.966040e+05 1 Linear |R| = 9.669859e+04 2 Linear |R| = 1.116926e+04 3 Linear |R| = 2.433818e+03 4 Linear |R| = 1.833252e+02 5 Linear |R| = 4.208464e+01 6 Linear |R| = 7.544688e+00 7 Linear |R| = 1.649546e+00 1 Nonlinear |R| = 2.860750e+03 0 Linear |R| = 2.860750e+03 1 Linear |R| = 8.032549e-01 2 Linear |R| = 1.013621e-04 2 Nonlinear |R| = 3.273228e+02 0 Linear |R| = 3.273228e+02 1 Linear |R| = 9.923694e-05 3 Nonlinear |R| = 8.082138e+00 0 Linear |R| = 8.082138e+00 1 Linear |R| = 6.466680e-08 4 Nonlinear |R| = 1.023385e-07 Solve Converged!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_idaholab_mastodon_issues_87-23issuecomment-2D347338136&d=DwMFaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=paKYuJLoM1hZQS59EJyrj-RFHeFKUOxHIYrglAEYyV8&m=euGp2XKeKPGhDVsINwhreKYol5TMQmCGAKu2SLFD5v4&s=0dOJWfGebvNq3qI4QceD0N6ej6WusViedSGzWOdUI3A&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AM7mbmXjBDxPr8P-5FMJacr1MXGPiXlJ6Wks5s6y0DgaJpZM4Qckbp&d=DwMFaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=paKYuJLoM1hZQS59EJyrj-RFHeFKUOxHIYrglAEYyV8&m=euGp2XKeKPGhDVsINwhreKYol5TMQmCGAKu2SLFD5v4&s=n9LkT2qJ3lvkzAmBZSwlhNVL4DVnF7sYcUo4PGAy0FE&e= .
-- Swetha Veeraraghavan, PhD Computational Scientist, Seismic Research Group Idaho National Laboratory (208) 526 1581 cell: 608 320 5438
Here are the profiling results for this problem, nothing jumps out as dominating, so this will probably require a lot of incremental fixes to improve things: prof.pdf.
From this it appears that three boxes jump out...MooseVariable compute Elem Values, RankFourTensor fill General Isotropic, and libMesh DofMap
@permcody What type of funding do we have for optimization? This problem that I am showing is about 20 times slower than an equivalent run with Abaqus and I am guessing any optimization done for this will help others as well.
Not so quick, @colejust! Amdahl's Law bites you here. Those big boxes each account for numbers around 10%. If you could optimize those specific routines to the point where they take zero time you'd only see a speedup of 100/(100-10) or about a 10% speedup. If you are looking for speedup of 20x you won't even get close, not even in the ballpark of 2x.
The biggest wins will be reducing the number of iterations (implementing Jacobians), and using more advanced preconditioners. We do have plans to consolidate the number of loops which will change the form of this graphs and reduce the number of calls to many of the boxes in the lower half. We are hoping that will give us some of the bigger gains we are looking for.
Funding: We have a fair amount of funding in our two programatic pots but we need to be careful to focus those efforts on framework optimizations that will benefit everyone (I believe that's what Andrew is implying here).
Correct, I was hoping we could start putting effort into these boxes to help everyone. A few percent here and there will start to add up and we need to do it.
For Mastodon we need to invest major time into figuring out Jacobians, ideally exactly. I think this problem is a good place to start, this is something I can do after I get docs milestones done.
We just need to be able measure our impact or we will be grilled. Every optimization should be benchmarked before and after. This is why @rwcarlsen put together that tool.
@sveerara could you send your jacobian math to @hoffwm , @aeslaughter , and I so we can go through it?
Sure, I will add the Jacobian code to the same branch and send over the pdfs.
On Thu, Nov 30, 2017 at 2:25 PM, colejust notifications@github.com wrote:
@sveerara https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sveerara&d=DwMCaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=paKYuJLoM1hZQS59EJyrj-RFHeFKUOxHIYrglAEYyV8&m=OpG9oXOOTPWcy0w8D074ytnVote36TmWM2sr7DveBoE&s=xb2Gk5IRjuEkWuhgE1jS_RdfvjF5qX26aP7eijMCJXE&e= could you send your jacobian math to @hoffwm https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_hoffwm&d=DwMCaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=paKYuJLoM1hZQS59EJyrj-RFHeFKUOxHIYrglAEYyV8&m=OpG9oXOOTPWcy0w8D074ytnVote36TmWM2sr7DveBoE&s=CxIrhPmKO9obFgxex7kVbvmzhU2sElgMLwEQ2N6_t7I&e= , @aeslaughter https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_aeslaughter&d=DwMCaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=paKYuJLoM1hZQS59EJyrj-RFHeFKUOxHIYrglAEYyV8&m=OpG9oXOOTPWcy0w8D074ytnVote36TmWM2sr7DveBoE&s=icJ0_9oiMvmxrwZiGlzryn8oPBdw-1CYC-S2CKY7ZTc&e= , and I so we can go through it?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_idaholab_mastodon_issues_87-23issuecomment-2D348293797&d=DwMCaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=paKYuJLoM1hZQS59EJyrj-RFHeFKUOxHIYrglAEYyV8&m=OpG9oXOOTPWcy0w8D074ytnVote36TmWM2sr7DveBoE&s=KOXpCBOpGeIOB9Aj40pBKXDypsc65AkOsl9p_NDzdCU&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AM7mbvbo797Fyef0dRpCK8CX6qXw3mMsks5s7wEagaJpZM4Qckbp&d=DwMCaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=paKYuJLoM1hZQS59EJyrj-RFHeFKUOxHIYrglAEYyV8&m=OpG9oXOOTPWcy0w8D074ytnVote36TmWM2sr7DveBoE&s=2gobSgcQ0FpSSQ7gMPKlDlWJjKljaFsXoZoKLoRa1v4&e= .
-- Swetha Veeraraghavan, PhD Computational Scientist, Seismic Research Group Idaho National Laboratory (208) 526 1581 cell: 608 320 5438
Closing since we have profiled mastodon several times after this.
Need a small 2D problem that runs in seconds. Soil block and structure block as simple as possible to looks at matrix and figure out run time issues.