Improve plasticity performance in Tensor Mechanics

dschwen commented 8 years ago

Currently the plasticity steps in Tensor Mechanics take substantially longer than the plasticity calculations in Solid Mechanics. ComputeMultiPlasticityStress::plasticStep currently takes a large chunk of the total simulation time. We need to bring TM to comparable levels of performance as SM (at least for isotropic systems if that makes any difference).

Pink @WilkAndy and @cpritam (and @friedmud to keep him in the loop)

 Event                         nCalls     Total Time  Avg Time    Total Time  Avg Time    % of Active Time  |
                                          w/o Sub     w/o Sub     With Sub    With Sub    w/o S    With S   |
Solve                                                                                                      |
   ComputeResidualThread       1598       2.6804      0.001677    16.7978     0.010512    14.57    91.30    |
   computeDiracContributions() 1715       0.0017      0.000001    0.0017      0.000001    0.01     0.01     |
   compute_jacobian()          117        0.3910      0.003342    1.2410      0.010607    2.12     6.75     |
   compute_residual()          1598       0.1374      0.000086    16.9603     0.010613    0.75     92.18    |
   compute_user_objects()      3512       0.0021      0.000001    0.0021      0.000001    0.01     0.01     |
   plasticStep()               670302     14.9079     0.000022    14.9079     0.000022    81.03    81.03    |
   residual.close3()           1598       0.0128      0.000008    0.0128      0.000008    0.07     0.07     |
   residual.close4()           1598       0.0100      0.000006    0.0100      0.000006    0.05     0.05     |
   solve()                     20         0.1964      0.009822    18.4458     0.922290    1.07     100.26   |
   updateDisplacedMesh()       1716       0.0454      0.000026    0.0454      0.000026    0.25     0.25

cpritam commented 8 years ago

I will look into the plasticStep(). In the NR solve of the constitutive model, both SM and TM should have the same number of iterations for similar tolerance levels. Can start by comparing that. Let me know if you have any suggestion.

WilkAndy commented 8 years ago

Hi @dschwen . I'm not actively working on this plasticity at present, and will not have time to do major stuff. However, your observation looks disastrous! Can you attach an input file? Perhaps the tolerances were not optimal?

a

dschwen commented 8 years ago

Check out https://github.com/idaholab/moose/pull/5632 for an input file.

On Sun, Sep 6, 2015, 4:11 PM Andy Wilkins notifications@github.com wrote:

Hi @dschwen https://github.com/dschwen . I'm not actively working on this plasticity at present, and will not have time to do major stuff. However, your observation looks disastrous! Can you attach an input file? Perhaps the tolerances were not optimal?

a

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/5667#issuecomment-138130209.

WilkAndy commented 8 years ago

OK, got it, thanks @dschwen . This is a real disappointment for me. I'll look when i get the chance, but that may be a few days (hopefully not weeks!)

a

WilkAndy commented 8 years ago

I found 30mins today, but no luck yet. By including the stuff below in the input file, i could see that the number of newton-raphson iterations in the return-map algorithm is just 1 (for a plastic step - obviously it's zero for an elastic step). This indicates that the return-map is working as planned, but for some reason it's super slow compared with the solid-mechanics return map.

Also, i put tangent_operator = elastic in ComputeMultiPlasticStress Material, and then tensor_mechanics and solid_mechanics produce very similar linear and nonlinear residuals. That doesn't speed up tensor_mechanics substantially, but i just wanted to mention it in case it helps you.

Hopefully i'll get time to look at this further sometime soon.

[AuxVariables] [./num_iters] order = CONSTANT family = MONOMIAL [../] []

[AuxKernels] [./num_iters] type = MaterialRealAux property = plastic_NR_iterations variable = num_iters [../] []

WilkAndy commented 8 years ago

By the way, what tool, or commands in moose code, would you use to do this profiling? i want to quickly see what functions/routines take the most time. They must be functions called from plasticStep.

permcody commented 8 years ago

I haven't done profiling on Linux machines for awhile, but this question was recently asked on this list. The old way is using gprof which is part of the standard GCC distribution. You add an extra flag to your compile flags then run the tool and analyze the results. It's several steps to get it all done. Daniel suggested using Google's "gperftools" but I haven't tried that out personally. On OS X it's very simple: You take an unmodified binary and run it with "Instruments" and it shows you a break down per file, per line on where you are spending the most time.

Whatever method you choose, it's well worth the effort invested to understand how to use these tools to analyze your code.

On Wed, Sep 9, 2015 at 5:06 AM Andy Wilkins notifications@github.com wrote:

By the way, what tool, or commands in moose code, would you use to do this profiling? i want to quickly see what functions/routines take the most time. They must be functions called from plasticStep.

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/5667#issuecomment-138876890.

waxmanr commented 8 years ago

A note from someone who knows nothing about profiling:

I spent some time looking at gperftools (I'm on Linux) --- all the info I've found online and in the INSTALL file indicates there are a lot of unresolved issues for google's tools when using 64bit Linux. I got it to install after a few extra steps, but it still fails one test and I don't know enough about it to be able to tell if that's significant. There do seem to be other tools out there, but I have no idea if they do what you guys are doing (e.g., Valgrind, oprofile, Zoom, etc.). Or there might be a fix for gperftools that I didn't find/didn't know how to implement. Or maybe none of you are using Linux, so it's not really relevant.

permcody commented 8 years ago

Thanks Rachel, perhaps we'll try some new tools out on Linux and see if we can come up with some good recommendations for our users. I forgot to mention one other very obvious tool...

MOOSE itself contains a performance logging capability. If you look through our source code you'll see several references to "push" and "pops" of names of methods or significant logic pieces. If you just want to do quick high level performance logging you are welcome to sprinkle a few of these push/pop pairs in your source code to get logging information right in MOOSE.

One note of caution, if you put these within a hotspot that is called hundreds of thousands of times or more, it may adversely impact your performance.

On Wed, Sep 9, 2015 at 8:55 AM Rachel Waxman notifications@github.com wrote:

A note from someone who knows nothing about profiling:

I spent some time looking at gperftools (I'm on Linux) --- all the info I've found online and in the INSTALL file indicates there are a lot of unresolved issues for google's tools when using 64bit Linux. I got it to install after a few extra steps, but it still fails one test and I don't know enough about it to be able to tell if that's significant. There do seem to be other tools out there, but I have no idea if they do what you guys are doing (e.g., Valgrind, oprofile, Zoom, etc.). Or there might be a fix for gperftools that I didn't find/didn't know how to implement. Or maybe none of you are using Linux, so it's not really relevant.

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/5667#issuecomment-138937296.

WilkAndy commented 8 years ago

Thanks everyone, i'm doing this currently on osx. i tried "instruments" and got:

xcode-select: error: tool 'instruments' requires Xcode, but active developer directory '/Library/Developer/CommandLineTools' is a command line tools instance

WilkAndy commented 8 years ago

Can someone tell me what is the size of the linear system in SM ? In TM it is 8x8 : there are 6 symmetric components of plastic strain, 1 yield function, and 1 internal parameter (which is irrelevant here because no hardening, but it is still included). 6+1+1=8. If SM somehow reduces this because of isotropic J2, that may explain the time difference.

permcody commented 8 years ago

XCode can be downloaded from the Apple App Store.

WilkAndy commented 8 years ago

OK, thanks @permcody . i did remember someone suggesting Xcode caused moose to break. obviously i was wrong.

permcody commented 8 years ago

I've heard the same rumors. However I always have XCode installed.

On Wed, Sep 9, 2015 at 2:32 PM Andy Wilkins notifications@github.com wrote:

OK, thanks @permcody https://github.com/permcody . i did remember someone suggesting Xcode caused moose to break. obviously i was wrong.

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/5667#issuecomment-139037538.

WilkAndy commented 8 years ago

Sorry for providing such dribbly output from my investigations - i never know when i'll next get to work on this, and i want to keep you updated with my potentially useful findings in case it helps your investigations.

Within plasticStep, about 10% of time is spent allocating stuff, and i can get rid of this by re-coding; about 15% is spent in the line search, again calculating unnecessary stuff since line search is not needed in this example; 75% is spent in nrStep.

To be continued...

permcody commented 8 years ago

This is great! It helps to document as you go since you never know when somebody else will find a few minutes to dig in with the partial information supplied so far.

cpritam commented 8 years ago

Andy - SM uses the radial return mapping with one variable instead of the general return mapping using 8. Definitely TM will be slower. We are looking into your code to see if we can make it faster. To get the same efficiency we would have to implement radial return mapping as a specific case of return mapping.

On Wed, Sep 9, 2015 at 2:27 PM, Andy Wilkins notifications@github.com wrote:

Can someone tell me what is the size of the linear system in SM ? In TM it is 8x8 : there are 6 symmetric components of plastic strain, 1 yield function, and 1 internal parameter (which is irrelevant here because no hardening, but it is still included). 6+1+1=8. If SM somehow reduces this because of isotropic J2, that may explain the time difference.

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/5667#issuecomment-139036379.

WilkAndy commented 8 years ago

OK, thanks @cpritam . the bulk of the time is actually spent in nrStep actually calculating the jacobian (and less time, but still noticeable, spent calculating the RHS). the solve of the 8x8 system using LAPACKgesv is comparatively quick.

In many ways i'm pleased with this so far - the code is actually doing the right thing, and most of the multi-surface generality is not actually costing much.

So perhaps we can short-circuit these jacobian+rhs calculations in this case, and leave all the code in ComputeMultiPlasticStress untouched. i'm really not sure of the best architecture here, but possibly we could do it in MultiPlasticityLinearSystem::calculateJacobian and calculateRHS.

WilkAndy commented 8 years ago

I think the only way around this is to let each plasticity model be able to perform its own custom return-map, if it is coded. I'm pretty sure the multi-surface stuff costs very little overhead, so we can leave that alone. Perhaps in ComputeMultiPlasticityStress::nrStep we can optionally use a plastic model's own custom return-map if: (1) there is only 1 plastic model; (2) the plastic model has a custom return-map actually defined. In the case of multiple plastic models, the general scheme needs to be used, and it will evidently cost cpu time.

Comments?

cpritam commented 8 years ago

Andy - Atleast for the isotropic j2 plasticity we are planning to implement the radial return mapping by overriding the nrStep. I hope that will make the computational time comparable with SM.

On Mon, Sep 14, 2015 at 4:55 AM, Andy Wilkins notifications@github.com wrote:

I think the only way around this is to let each plasticity model be able to perform its own custom return-map, if it is coded. I'm pretty sure the multi-surface stuff costs very little overhead, so we can leave that alone. Perhaps in ComputeMultiPlasticityStress::nrStep we can optionally use a plastic model's own custom return-map if: (1) there is only 1 plastic model; (2) the plastic model has a custom return-map actually defined. In the case of multiple plastic models, the general scheme needs to be used, and it will evidently cost cpu time.

Comments?

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/5667#issuecomment-140037808.

WilkAndy commented 8 years ago

Is anyone working on this currently? Last night i had an idea that i could implement some sort of "initial guess" at the solution to the return map. For things like J2 this would actually be the exact solution, and for more complicated models it could be an approximate solution. Then with careful coding, the nrStep (and line search, etc) could be bypassed if this initial guess was good enough. The advantage of doing this, rather than overriding nrStep itself is for complicated models we don't have to provide the exact solution, which might be quite hard.

One problem with the whole idea of UserObjects defining hardening is that the PlasticModel might find it difficult to provide a solution for arbitrary hardening. But i feel that's a price we have to pay for the beautiful flexibility of UserObjects.

a

dschwen commented 8 years ago

Rachel Waxman is working on radial return mapping. Check with her.

On Sun, Oct 11, 2015, 3:32 PM Stephanie Pitts notifications@github.com wrote:

I should be completing the implementation of isotropic J2 plasticity ( a direct translation of the combined creep + plasticity model from solid mechanics) which Pritam mentioned, but I have not had a chance to work with the code at all as of late. I'd be quite interested to see your "initial guess" code for the J2 case.

On Sun, Oct 11, 2015 at 2:12 PM, Andy Wilkins notifications@github.com wrote:

Is anyone working on this currently? Last night i had an idea that i could implement some sort of "initial guess" at the solution to the return map. For things like J2 this would actually be the exact solution, and for more complicated models it could be an approximate solution. Then with careful coding, the nrStep (and line search, etc) could be bypassed if this initial guess was good enough. The advantage of doing this, rather than overriding nrStep itself is for complicated models we don't have to provide the exact solution, which might be quite hard.

One problem with the whole idea of UserObjects defining hardening is that the PlasticModel might find it difficult to provide a solution for arbitrary hardening. But i feel that's a price we have to pay for the beautiful flexibility of UserObjects.

a

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/5667#issuecomment-147246299.

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/5667#issuecomment-147247654.

WilkAndy commented 8 years ago

Hey Rachel (@waxmanr) , what's the status of your work? i'm super slow on this stuff, because can only work during the evenings. i don't want to double-up on your work!

I've obviously thought of this stuff before, but had forgotten. You'll see that each TensorMechanicsPlasticXXXX UserObject can override a function called activeConstraints. One of the things that function can return is "returned_stress". That is, each TensorMechanicsPlasticXXXX can do its own Newton-Raphson, or anything other algorithm, to provide a guess at the stress after return-mapping. (This was evidently useful in the multi-surface situation, but here we're just talking about one model - J2 - but we shoudl be able to use the same approach.) I bet we have to re-write some stuff, but it looks as if this could be a good place to start.

a

WilkAndy commented 8 years ago

I just spent 30mins looking into this further, and am pretty confident we could tackle it this way. We can re-write ComputeMultyPlasticStress::elasticStep to use a new function TensorMechanicsPlasticXXXX::returnMap. That function does the return-map for that particular plastic model (e.g. radial return for J2): it calculates the returned stress, the plastic-multiplier increment, the change in internal parameter, the new yield function value, and the new flow direction. (It'll default to returned_stress=stress, \dot{lambda)=0, new_internal=internal, f=f(stress,intnl), so that by checking the return value of f we can easily see if the step was elastic.) Then ComputeMultiPlasticStress will basically do the same as elasticStep does now, but if there is just one plasticModel and the residual<0.5 then it'll signal "successful return" and exit from computeQpStress. Then there will be basically no computational overhead over-and-above whatever the coder puts in TensorMechanicsPlasticXXXX::returnMap.

If there is more than one plasticModel, and only one f>0, the returned_stress can be used as an initial guess to the NR process.

Tell me what you think. I'd really like to start coding this week, but we'll see if i have time,

a

waxmanr commented 8 years ago

Hi @WilkAndy: I have written a radial return map function (overriding nrStep and also the consistent tangent operator, if it's not set to elastic). So far, with the elastic flag set, it's reduced run time by about 40% for J2 (4x4x4 gen-mesh, 50 time steps). I have pushed my code to my repo, but not created a PR yet. We were waiting to see if any other changes could be made to speed things up more. I believe @cpritam was going to try to do some profiling to find out where it's still getting stuck.

Without the elastic flag set, it's reduced run time by just over half. I also tried changing a few vectors to member variables so they could be pre-sized, but that doesn't seem to have made much of a difference.

Here is the link to my current branch on this, if you want to take a look: https://github.com/waxmanr/moose/tree/returnMappingTM_5667/modules/tensor_mechanics

The code might need to be cleaned up a bit, sorry. If you have any ideas, I'd be happy to try to implement them. If we have the userObjects override the return map function, it'd just be accessed via _f[model]->returnMap(), yes?

This is somewhat unrelated, but since that was put on hold awaiting profiling, I'm also working on switching ComputeMultiPlasticityStress over from one parameter per model, to n parameters per model.

WilkAndy commented 8 years ago

That's really great progress, @waxmanr . I really encourage you to do a PR. I won't have time to review your code until tonight at the earliest (>12 hours from now). Now perhaps i don't have to do anything! But perhaps my method outlined above would be computationally faster as less function evaluations would be needed (which can be expensive if you're calculating eigenvalues and so on). Anyway modifying your code would be easier than starting from scratch, and you doing the return-map is the bulk of the work. I'm glad you did the tangent operator - i've found it makes a fair difference in real simulations.

Yes _f[model]->returnMap(blah, blah, blah)

Related to member variables - i know @dschwen was horrified at the number of arguments my functions took, and member variables are a way of tidying that. However, i'm quite resistant to that, as this code is so complicated i wanted to make it clear what function was modifying what variable, and member variables could lead to super subtle bugs.

Also, it'd be most excellent if you generalised to Q internal variables per model. That's something that i started coding but all the std::vector stuff got crazy. At the time, I couldn't actually think of any physical models that had more than one internal variable, so i ditched the coding. However, the theory is written out at

http://mooseframework.org/wiki/PhysicsModules/TensorMechanics/ReturnMap/

and i hope you're implementing that, otherwise the doco will have to be re-written! Since that time i've often wished i had coded the Q>=1 case, as i now think that it may have made the code a little more transparent (since it would have been clear what derivatives you're using, and Einstein contraction would have been clearer), and also i CAN think of situations where having Q>1 is useful.

So overall i think it's fantastic you're looking at that. Also, it's amazing you can understand the theory and the code. i think it's the most opaque code i've ever written, but i couldn't think of a better way of doing it.

a

WilkAndy commented 8 years ago

Hi @waxmanr , i had a quick look at your code, and yes, i think it does need refactoring, but at least you've done all the hard work, and stuff just needs rearranging.

Generally, i think that a lot of your additions to ComputeMultiPlasticStress should not actually be in that class, and should be included in TensorMechanicsPlasticJ2, somehow. The problem is that ComputeMultiPlasticStress is really meant to run for any plastic model (all of the TensorMechanicsPlasticXXXX models), and radial return makes no sense for anything except J2 plasticity. Similar for things like including a yieldStrength in the base class TensorMechanicsPlasticModel - only J2 has the notion of "yieldStrength", so that should be in TensorMechanicsPlasticJ2 instead.

waxmanr commented 8 years ago

Yes, I agree it should be rearranged. I'm new to both C/C++ and return mapping (saw them both for the first time last month), so I mostly just wanted to get something working.

I will start moving things over to the userObjects today. Once I get that done and working, I'll create a PR so you can take a look.

Also, once I figure out some segfaults / out-of-bounds errors, I'll open an issue for the Q>=1 work. There are about five million vectors (and now vectors of vectors of tensors), so it'll probably take me a few to chase down the problem. Pritam wanted to implement the Gurson model, which I believe uses j2 + damage as a second parameter for porous materials. So that will be our test case (new userObjects) if I get the underlying setup to work.

I only have a couple weeks left on my internship, so I'm trying to get as much done as possible before I leave.

WilkAndy commented 8 years ago

OK cool @waxmanr . Here's how i reckon we should proceed:

(1) You write your TensorMechanicsPlasticJ2 with a "returnMap" function (write a dummy one into TensorMechanicsPlasticModel), and a "consistentTangentOperator" function. Call them different names if you like... (2) I grab your code before you do a PR and attempt to implement my strategy outlined above. If it works, then you grab that code and do some testing on speed, etc, and then PR. If it fails then we re-assess.

Feel free to propose an alternative way of collaborating - it's just i will have a couple of hours her-and-there over the next few days, so this is good timing for me.

Segfaults/out-of-bounds: you poor thing. It's super complicated. I suggest we get this returnMap stuff completed before tackling that, especially given your lack of time, since getting something 100% done is better than lots of things uncompleted (again just my opinion - feel free to differ!)

Gurson - oh, i think i published stuff on that once, so used to be quite familiar with it. in principal i could have some input. you could always code it with just Q=1, and implement all the necessary derivatives and so on. at least it'd be a good start.

a

WilkAndy commented 8 years ago

Oh, maybe you just did (1), @waxmanr ?

waxmanr commented 8 years ago

I did (1) but used nrStep, not radialReturn. I could move everything over from earlier on if you want? The only plus that would offer would be not going through all the deactivate_due_to_ld, etc. Though I think with j2, it just skips those loops/if-statements because everything is false or size-1. nrStep is where the major difference takes place.

The problem is now that I've moved over nrStep to the j2 object, the speed advantage is gone. In fact, for longer tests, it's taking longer than the original code did. That could definitely be something I missed or messed up somewhere? Or are function calls via user objects expensive, time-wise?

WilkAndy commented 8 years ago

Thanks for (1). Is it OK for you to leave this for the next 12 hours or so, while i try to implement (2)? I hope to get on to this about 3 hours from now.

Yea, all that deactivate_due_to_ld, etc, shouldn't play any role here at all. I'm confident we'll get performance gains, and timing will be similar to SolidMechanics once it all pans out.

a

WilkAndy commented 8 years ago

OK @waxmanr , i've got your branch and am modifying it. Please don't "push -f" over the top of my stuff for the next few hours,

a

WilkAndy commented 8 years ago

I just had lots of fun doing about 8 hours of coding MOOSE !! Haven't had the chance to do that for ages. I've just created a PR for you, @waxmanr , which should bring my code into your branch.

The main contribution is ComputeMultiPlasticityStress.quickStep and MultiPlasticityRawComponentAssembler.returnMapAll. Now it should be pretty easy to implement your own return-map algorithm.

By the way, the one in J2 is not correct for hardening, oops - i suspect a Newton-Raphson is needed there.

My runtimes are 5.2s for SolidMechanics (LSH_mod.i) and 8.1s for TensorMechanics (j2_hard1_mod.i). I'm not sure whether it's going to be easy to get further gains from TensorMechanics, but we'll see.

@waxmanr , could you search the files for "TODO" - you'll see there are still missing things!

I doubt i'll have time to work on this for the rest of the week. Lots of things need tidying up, sorry.

a

WilkAndy commented 8 years ago

Hi @waxmanr , If you need to chat (using our voices i mean, not our fingers), there is a short time when we're both at work (viz in the next couple of hours). i can ring you if you can't ring me,

a

waxmanr commented 8 years ago

I did have a couple questions/comments, looking at the code. I ran your new routines and the timing is much improved.

First, the tangent operator: I fixed this, where you had a TODO note (CMPS, around line 237). I just changed your definition from E_ijkl to:

if (_tangent_operator_type == elastic) consistent_tangent_operator = E_ijkl; else consistent_tangent_operator = _f[custom_model]->consistentTangentOperator(stress, intnl, E_ijkl);

where custom_model is an unsigned int passed to MPRCA::returnMapAll to keep track of which model successfully used the custom returnMap. There's likely a better way to do that, but it seemed easiest.

With that setup, I'm getting numbers that make sense when I switch the elastic flag on and off. (Before, that flag obviously made no difference, since it was explicitly defined as elastic, always.)

Does this make logical sense?

WilkAndy commented 8 years ago

I'm not sure about this consistent_tangent_operator (cto) stuff, @waxmanr . I had a quick look and what you did is fine if quickStep was called from computeQpStress. However, what if it were called from CMPS::returnMap? Then the cto might have contributions from surfaces that were active at some stage during the return process, and also you might not even need to calculate it if final_step=false. So in this case, if final_step=true you really should use the consistentTangentOperator function.

I propose that quickStep be refactored. It needs to know where it is called from, it needs to know if this is the final step, and it needs to update cumulative_pm if it is called from CMPS::returnMap. This latter thing means that returnMapAll has to have a pm and cumulative_pm argument (you call this dpm in your J2::returnMap, by the way) which it can set and increment appropriately, and TensorMechanicsPlasticXXXX::returnMap functions have to have a dpm argument which it sets.

So: quickStep should somehow implement the logic: (1) if called from computeQpStress do exactly what it does currently (2) if called from CMPS::returnMap then do the returnMapAll and updating plastic_strain as is currently coded, setting pm and incrementing cumulative_pm. If successful_return=true and final_step=true then call CMPS::consistentTangentOperator to calculate the cto.

I bet there's a nice clean way you can implement this.

a

WilkAndy commented 8 years ago

Regarding timings, I get:

5.2s for SolidMechanics (LSH_mod.i)
8.1s for TensorMechanics (j2_hard1_mod.i).
7.1s for TensorMechanics if in computeQpStress, the rotations of _stress[_qp], _elastic_strain[_qp], and _plastic_strain[_qp] using _rotation_increment[_qp] are NOT performed.

Given this, i wouldn't be surprised if the difference in runtimes is now not due to return-mapping, but is just due to dealing with full tensors rather than SolidMechanics "vectorized" things.

a

waxmanr commented 8 years ago

Yes, I don't think TM will ever be quite on par with SM, as far as timing. I'll take a look today and see if I can implement what you suggested in your previous comment. I'll let you know what sort of progress I make.

friedmud commented 8 years ago

Note: we don't need them to be perfectly equal in timings... as long as the penalty is not egregious. Giving up a little bit of time for better and more maintainable code that is shared by everyone in the community that is doing solid mechanics should be fine.

From the timings Andy just posted it looks like there are still a few things that can be done to bring them even closer to parity... but even right now the penalty isn't that bad.

Derek On Thu, Oct 15, 2015 at 10:12 AM Rachel Waxman notifications@github.com wrote:

Yes, I don't think TM will ever be quite on par with SM, as far as timing. I'll take a look today and see if I can implement what you suggested in your previous comment. I'll let you know what sort of progress I make.

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/5667#issuecomment-148398055.

tonkmr commented 8 years ago

The long term goals of TM is to create "smart" tensors that act like tensors but only store and compute what is necessary for their symmetry. @dschwen is that work still in the plans?

waxmanr commented 8 years ago

For reference, with the elastic cto and end_time = 0.5 (previous numbers were end_time = 0.1):

-- 34.1s for SM (LSH_mod.i) -- 52.1s for TM (j2_hard1_mod.i)

So there's still a noticeable difference when the test runs a little longer (the above is 50 time steps).

(edited because I found what I was looking for)

dschwen commented 8 years ago

@tonkmr right now it looks like such smart tensors won't be necessary as the actual tensor math does take up a significant fraction of the CPU time. Once @waxmanr and @WilkAndy are done I'll take another look at the profiling results and will try to identify remaining issues.

waxmanr commented 8 years ago

Run times after recent changes (my computer is feeling quick today):

end_time	TM J2	SM LSH
0.1	5.575	3.492
0.5	35.556	21.1825

(end_time = 0.1 is 10 steps, 0.5 is 50 steps)

Also added an exponential hardening test (j2_hard2_exp.i) to check von mises stress vs equivalent plastic strain for exponential relationship (looks as it should). Will re-gold when this is all done.

dschwen commented 8 years ago

@waxmanr what are the input files for "TM J2" and "SM LSH"? Still j2_hard1_mod.i and LSH_mod.i?

I'm getting conflicts when I'm doing git pull --rebase upstream devel on your branch.

waxmanr commented 8 years ago

@dschwen: Yes, re: input files.

I've been looking at run times for some of the other files (e.g., j2_hard2_exp.i), but it's not really fair to compare those to solid mechanics. Once a PR is created for this (eventually...), I'll clean up the directory & get rid of some of the old stuff we were checking.

cpritam commented 8 years ago

@waxmanr Are you comparing the non hardening case for time? I think a linear strain hardening currently doesn't exist in tensor mechanics.

On Wed, Oct 28, 2015 at 8:31 AM, Rachel Waxman notifications@github.com wrote:

@dschwen https://github.com/dschwen: Yes.

I've been looking at run times for some of the other files (e.g., j2_hard2_exp.i), but it's not really fair to compare those to solid mechanics. Once a PR is created for this (eventually...), I'll clean up the directory & get rid of some of the old stuff we were checking.

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/5667#issuecomment-151862641.

waxmanr commented 8 years ago

@cpritam: In the solid mechanics file (LSH_mod.i), I have hardening_constant = 0. Is that what you mean?

@dschwen: I'm trying to fix that conflict now. When Andy pulled my changes and created a PR on my branch, he changed some things that maybe somehow didn't make it on to my branch when I pulled the merge? At least I'm assuming that's what's happened. I've never had a --rebase conflict before, and this is the first time I've merged someone's changes into my own branch, instead of the other way around. I see where the problem is, just trying to figure out how to resolve it.

waxmanr commented 8 years ago

Okay, @dschwen: this should be fixed now. I hope. I can git pull --rebase upstream devel without conflicts and the changes have been force pushed back. And the run times I posted above are still valid.

WilkAndy commented 8 years ago

What's happened with this issue? I've been working on something else, and i know @waxmanr is due to finish soon. Is someone waiting on me to do something?

a

idaholab / moose

Improve plasticity performance in Tensor Mechanics #5667