sslattery / Chimera

Other
1 stars 1 forks source link

SPN Convergence Analysis for Reactor Problems #28

Open sslattery opened 11 years ago

sslattery commented 11 years ago

The convergence behavior of the SPN equations for Reactor problems has been observed to be poor for the block Jacobi preconditioning strategy studied. Given real reactor problems, the goal of this task is to find a preconditioning strategy that yields fewer iterations and therefore shorter Markov chains.

sslattery commented 11 years ago

In addition to parallel parameters, I forgot to add that we should also be looking at solver parameters here including weight cutoff and number of histories per MCSA iteration.

sslattery commented 11 years ago

First thing to do here is define a good problem to test this out with. My first thoughts are to do an infinite medium problem of reasonable size. For cross sections, perhaps we should consider a reactor/assembly-type group structure with homogenization so we know what to expect for real problems.

Any suggestions Tom?

sslattery commented 11 years ago

So I've pulled Tom's 3x3 pin problem from Orthanc and will start there as the character of the linear operator will be in line with that observed for other reactor problems. Reflecting and vaccumm boundaries are used here on 3 sides to effectively construct 1/8 of the problem. It has UO2 with cladding and moderator and 7 energy groups with upscatter and fission. I can mesh it pretty fine to test scaling, which will work well as many Jacobi blocks must be inverted for the preconditioning. I'll run it on dionysus next to verify.

tmdelellis commented 11 years ago

Stuart,

We have some full core problems with real cross section we can run when you're ready.

Tom

On Wed, Mar 20, 2013 at 4:59 PM, Stuart Slattery notifications@github.comwrote:

First thing to do here is define a good problem to test this out with. My first thoughts are to do an infinite medium problem of reasonable size. For cross sections, perhaps we should consider a reactor/assembly-type group structure with homogenization so we know what to expect for real problems.

Any suggestions Tom?

— Reply to this email directly or view it on GitHubhttps://github.com/sslattery/Chimera/issues/28#issuecomment-15203091 .

sslattery commented 11 years ago

Ok. I'm going to need a day or two with these assembly problems to make sure all types of MSOD parallelism are functioning correctly through the Thyra interfaces. Its easy to do without Thyra but might require some more intrusive code in Denovo to get the communicator structure correct. Once I determine if that's necessary I'll be ready to merge my branch.

sslattery commented 11 years ago

So I ran into this:

Denovo - pyspn Python Front-End
-------------------------------
Release      :     denovo-3_0_0
Release Date :       3-MAR-2011
Build Date   :         20130321

Traceback (most recent call last):
  File "fs_mcls.py", line 412, in <module>
    (r, z, mixids, cleanids, table) = build_mesh(10)
  File "fs_mcls.py", line 209, in build_mesh
    uo2_pin  = Pincell()
  File "/local.hd/cnergg/stuart/builds/Exnihilo/release/python/denovo_utils.py", line 1136, in __init__
    this = _denovo_utils.new_Pincell(*args)
NotImplementedError: Wrong number or type of arguments for overloaded function 'new_Pincell'.
  Possible C/C++ prototypes are:
    rtk::PyPincell< rtk::PyCyl_Pin_Geom >::PyPincell(rtk::PyCyl_Pin_Geom const &,rtk::PyRTK_Tag const &,denovo::PyCompositions const &,double,double)
    rtk::PyPincell< rtk::PyCyl_Pin_Geom >::PyPincell(rtk::PyRTK_Tag const &,denovo::PyComposition const &,double,double)

Looks like the RTK code base has changed since Tom ran these problems. I'll have to figure out how to update the python input.

sslattery commented 11 years ago

It looks like the assembly inputs are out of date with respect to RTK definitions for pincells and material compositions.

Tom - Do you have a more up-to-date input for these spn assembly problems that are compatible with the current Exnihilo master branch?

tmdelellis commented 11 years ago

Stuart,

Let me get in touch with you this afternoon.

Tom

On Thu, Mar 21, 2013 at 12:49 PM, Stuart Slattery notifications@github.comwrote:

It looks like the assembly inputs are out of date with respect to RTK definitions for pincells and material compositions.

Tom - Do you have a more up-to-date input for these spn assembly problems that are compatible with the current Exnihilo master branch?

— Reply to this email directly or view it on GitHubhttps://github.com/sslattery/Chimera/issues/28#issuecomment-15250668 .

sslattery commented 11 years ago

So it looks like I can go through the pyneutronics interface in the Insilico package to run SPn with RTK geometry. The issue here is that Scale (Fulcrum) is a required dependency for this build and makes this much more complicated than I would like at this point. I might just have to make my own problems for now so I can get something working with the software/machines available to me.

tmdelellis commented 11 years ago

Stuart,

Sorry, had to work on some emergencies yesterday. I will call to help you with this later today.

Tom

On Fri, Mar 22, 2013 at 10:31 AM, Stuart Slattery notifications@github.comwrote:

So it looks like I can go through the pyneutronics interface in the Insilico package to run SPn with RTK geometry. The issue here is that Scale (Fulcrum) is a required dependency for this build and makes this much more complicated than I would like at this point. I might just have to make my own problems for now so I can get something working with the software/machines available to me.

— Reply to this email directly or view it on GitHubhttps://github.com/sslattery/Chimera/issues/28#issuecomment-15299587 .

sslattery commented 11 years ago

That sounds good. Thanks Tom.

sslattery commented 11 years ago

So I decided to just bite the bullet on this one and get a Scale 6.1 license so I can build. I'm working with Mark Baird to get that setup. Once I have that I'll build the pyneutronics front-end and run through the AMA benchmark xml inputs and cross sections Steven Hamilton gave me. There's too much good stuff here not to do the Scale build and it will help me in the long run. This probably means no more laptop builds but those were getting pretty epic as it was.

sslattery commented 11 years ago

Everything is building now. Here's the group 19 neutron flux at the centerline of benchmark 3 - a 17x17 pin PWR assembly criticality calculation.

g19p3

I solved this problem with Aztec GMRES with ILUT preconditioning. I'm having some issues with MCSA convergence and I think it might be the adjoint solver giving a bad correction. I'm looking at it now.

tmdelellis commented 11 years ago

Is it just failing to converge?

I'm glad u got scale installed, it will make life much easier to set these up.

Tom Evans (865)748-8190 Sent from my iPhone

On Mar 27, 2013, at 3:00 PM, Stuart Slattery notifications@github.com wrote:

[image: g19p3]https://f.cloud.github.com/assets/755901/309983/71d0aa64-9710-11e2-9c19-fd341289e90d.png

I solved this problem with Aztec GMRES with ILUT preconditioning. I'm having some issues with MCSA convergence and I think it might be the adjoint solver giving a bad correction. I'm looking at it now.

— Reply to this email directly or view it on GitHubhttps://github.com/sslattery/Chimera/issues/28#issuecomment-15546224 .

sslattery commented 11 years ago

Yeah the residual is blowing up on the first eigenvalue iteration:

MCSA Iteration 1: Residual = 2.285915e+05
MCSA Iteration 2: Residual = 1.573952e+06
MCSA Iteration 3: Residual = 3.018761e+08
MCSA Iteration 4: Residual = 1.225985e+10
MCSA Iteration 5: Residual = 1.829143e+11
MCSA Iteration 6: Residual = 1.979053e+12
MCSA Iteration 7: Residual = 6.646306e+13
MCSA Iteration 8: Residual = 1.085218e+15
MCSA Iteration 9: Residual = 1.195397e+16
MCSA Iteration 10: Residual = 7.100922e+17

The fact that the adjoint method is running probably means that the preconditioner is working. If the spectral radius of the iteration matrix is greater than 1 then the history weights go infinite. I just realized though that this build had MCLS DBC turned off. I'm going to turn it back on and see what I get out. There may some numerical issues somewhere and DBC will determine if the histories are misbehaving.

sslattery commented 11 years ago

DBC didn't pick up anything. I might try just an adjoint solve next.

sslattery commented 11 years ago

So it may just be that this problem is really ill-conditioned. Check out the residuals from a block-Jacobi preconditioned Richardson iteration:

MCSA Iteration 1: Residual = 9.394369e-01
MCSA Iteration 2: Residual = 8.848969e-01
MCSA Iteration 3: Residual = 8.770492e-01
MCSA Iteration 4: Residual = 8.400525e-01
MCSA Iteration 5: Residual = 8.333852e-01
MCSA Iteration 6: Residual = 8.052318e-01
MCSA Iteration 7: Residual = 7.991999e-01
MCSA Iteration 8: Residual = 7.781818e-01
MCSA Iteration 9: Residual = 7.725512e-01
MCSA Iteration 10: Residual = 7.563541e-01
MCSA Iteration 11: Residual = 7.510019e-01
MCSA Iteration 12: Residual = 7.381150e-01
MCSA Iteration 13: Residual = 7.329699e-01
MCSA Iteration 14: Residual = 7.224718e-01
MCSA Iteration 15: Residual = 7.174891e-01
MCSA Iteration 16: Residual = 7.087863e-01
MCSA Iteration 17: Residual = 7.039364e-01
MCSA Iteration 18: Residual = 6.966177e-01
MCSA Iteration 19: Residual = 6.918797e-01
MCSA Iteration 20: Residual = 6.856443e-01
MCSA Iteration 21: Residual = 6.810032e-01
MCSA Iteration 22: Residual = 6.756243e-01
MCSA Iteration 23: Residual = 6.710688e-01
MCSA Iteration 24: Residual = 6.663732e-01
MCSA Iteration 25: Residual = 6.620045e-01

The spectral radius here is probably about 0.99 based on these residuals. Typically when I see the MCSA iteration behavior I posted above that means not enough histories were used at each iteration to get a good correction. In my past work, a good number of histories was usually around the number of DOFs in the problem. I've more than doubled that number in these MCSA runs for this problem with no luck. I wonder what would happen if I point Jacobi precondition after I've block preconditioned?

sslattery commented 11 years ago

To note it says MCSA Iteration above because I just quickly modified that solver to do a Richardson iteration.

sslattery commented 11 years ago

Tom - any idea why cross section generation would hang when SPN block matrices are enabled for problem 3? In general I've been seeing weird behavior with the linear operator coming out of SPN (so far only the CrsMatrix type is getting the the eigenvalue iterations). I'm still debugging so nothing conclusive yet.

sslattery commented 11 years ago

So I take that last one back - looks like it runs just fine with SPN block matrices and an Aztec solve. I think there are some issues here with my implementation in MCLS. I'm kind of surprised by this because of all the unit testing but the poor convergence, this block matrix issue, and noted parallel issues on 2 cores mean I've got some things to work out.

tmdelellis commented 11 years ago

Stuart,

The block matrix and crs matrix implementations should give the same results. Have tried doing a a 1-group collapse and solve. That should work out the issues between block and point matrices.

Tom

Sent from my iPad

On Mar 28, 2013, at 11:36 AM, Stuart Slattery notifications@github.com wrote:

So I take that last one back - looks like it runs just fine with SPN block matrices and an Aztec solve. I think there are some issues here with my implementation in MCLS. I'm kind of surprised by this because of all the unit testing but the poor convergence, this block matrix issue, and noted parallel issues on 2 cores mean I've got some things to work out.

— Reply to this email directly or view it on GitHubhttps://github.com/sslattery/Chimera/issues/28#issuecomment-15595227 .

sslattery commented 11 years ago

So I've decided that the VbrMatrix implementation isn't hanging - my block preconditioner is REALLY inefficient in its construction routine because I am using a row matrix interface. I'm going to refactor this first.

tmdelellis commented 11 years ago

This is why we implemented the crs matrix. ML uses row matrix as well which means tons of copying.

Sent from my iPad

On Mar 28, 2013, at 1:19 PM, Stuart Slattery notifications@github.com wrote:

So I've decided that the VbrMatrix implementation isn't hanging - my block preconditioner is REALLY inefficient in its construction routine because I am using a row matrix interface. I'm going to refactor this first.

— Reply to this email directly or view it on GitHubhttps://github.com/sslattery/Chimera/issues/28#issuecomment-15601696 .

sslattery commented 11 years ago

Yeah even with my refactor which significantly reduced the number of row matrix interface calls (more than an order of magnitude) this is still taking forever. I ran my simple test problem I used for spectral analysis and turned off the block matrices and observed convergent behavior. So I don't think the matrix form is an issue here.

sslattery commented 11 years ago

I'm moving on to analyzing just the behavior of the adjoint solver. I'm getting this from my DBC in a parallel problem:

application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
C++ Standard Library Assertion: MCLS Assertion: DT::isLocalState(*d_domain,starting_state), failed in /filespace/people/s/stuart/local_hd/software/Exnihilo/MCLS/src/MCLS_UniformAdjointSource_impl.hpp, line 343.

Basically what that means is that I'm generating a history from the source on proc 0 in a state that exists on a different proc in the linear operator. That implies that the right-hand side of the linear system has a different Epetra_Map than the linear operator. I'll have to take a look at the SPN source code in Exnihilo to see how the parallel decomposition is formed.

sslattery commented 11 years ago

So I've verified that the linear operator, left hand side, and right hand side have different decompositions/orderings. I'm going to make everything consistent with the operator through import/export operations. This type of thing would clearly mess up the calculation as you could transport through the domain with the wrong source and tally into the wrong vector - everything is simply wrong. This is not a bug in Denovo - simply a super general use case I did not anticipate or write unit tests for. I've fixed the right hand side consistency. Left hand side is next and a little trickier. I'll report when done.

sslattery commented 11 years ago

I've fixed left hand consistency but the convergence issues are persisting. I'll have to do some more debugging.

sslattery commented 11 years ago

As I was solving a 1-group SP1 form of problem 3, I'm actually solving the diffusion equation. I went back to my simple 2D diffusion implementation in Chimera for an easier place to study the convergence issues. I've made some interesting observations:

  1. Large spectral radii (above 0.95) not only suffer from slow stochastic histories that transition many times with low weight contributions before reaching the cutoff but also more histories are needed in an MCSA iteration to get a good correction for convergence. This is hitting us twice here: once for long histories due to the larger spectral radii and again for requiring more histories to get a good correction.
  2. The larger the spectral radius the more histories that are need to ensure convergence. This number grows rapidly with 5 times the number of degrees of freedom needed to converge a problem with a spectral radius of 0.9898.
  3. For larger spectral radii problems that come from real discretizations (i.e. problem 3) the method is effectively unusable even with the appropriate preconditioning such as point Jacobi for the diffusion equation and block Jacobi for the higher order SPN equations.

As an example, here is the iteration output from a 150x150 grid 2D diffusion problem using 100,000 histories per MCSA iteration to get convergence. Note I stopped the iterations due to time but the residual is monotonically decreasing.

Diffusion Length 3.65148
Mesh Size 0.666667
L/dx 5.47723
SPECTRAL RADIUS: 0.989844
MCSA Iteration 1: Residual = 13.1622
MCSA Iteration 2: Residual = 10.8821
MCSA Iteration 3: Residual = 8.42188
MCSA Iteration 4: Residual = 6.98505
MCSA Iteration 5: Residual = 6.88249
MCSA Iteration 6: Residual = 4.394
MCSA Iteration 7: Residual = 3.61359
...
MCSA Iteration 24: Residual = 0.0836128
MCSA Iteration 25: Residual = 0.0695528
MCSA Iteration 26: Residual = 0.0627831
MCSA Iteration 27: Residual = 0.0457332
MCSA Iteration 28: Residual = 0.0403989
MCSA Iteration 29: Residual = 0.0265606
MCSA Iteration 30: Residual = 0.023427
MCSA Iteration 31: Residual = 0.0200454
MCSA Iteration 32: Residual = 0.0190552
MCSA Iteration 33: Residual = 0.0126856
MCSA Iteration 34: Residual = 0.0108319
^C

real    7m13.640s
user    7m13.115s
sys 0m0.152s

Now lets relax the number of histories to ~1 per DOF for a total of 25,000 and observe the loss of convergence:

Diffusion Length 4.08248
Mesh Size 0.666667
L/dx 6.12372
SPECTRAL RADIUS: 0.991808
MCSA Iteration 1: Residual = 28.3203
MCSA Iteration 2: Residual = 51.5849
MCSA Iteration 3: Residual = 92.1227
MCSA Iteration 4: Residual = 166.191
MCSA Iteration 5: Residual = 296.01
MCSA Iteration 6: Residual = 613.562
MCSA Iteration 7: Residual = 989.295
MCSA Iteration 8: Residual = 1756.76
MCSA Iteration 9: Residual = 3530.85
MCSA Iteration 10: Residual = 5084.65
MCSA Iteration 11: Residual = 9013.4
MCSA Iteration 12: Residual = 17575.1
MCSA Iteration 13: Residual = 29198
MCSA Iteration 14: Residual = 63189.8
MCSA Iteration 15: Residual = 91088.1
MCSA Iteration 16: Residual = 180757
^C

real    1m2.939s
user    1m2.852s
sys 0m0.028s

Here is the same grid with 25,000 histories per iteration and the cross sections tweaked to get a slightly smaller spectral radius:

Diffusion Length 1.74078
Mesh Size 0.666667
L/dx 2.61116
SPECTRAL RADIUS: 0.957608
MCSA Iteration 1: Residual = 12.813
MCSA Iteration 2: Residual = 12.7237
MCSA Iteration 3: Residual = 10.3598
MCSA Iteration 4: Residual = 6.73363
MCSA Iteration 5: Residual = 5.34828
MCSA Iteration 6: Residual = 4.79887
MCSA Iteration 7: Residual = 3.46504
MCSA Iteration 8: Residual = 2.68539
MCSA Iteration 9: Residual = 2.54452
MCSA Iteration 10: Residual = 1.65355
MCSA Iteration 11: Residual = 1.51612
MCSA Iteration 12: Residual = 1.11646
MCSA Iteration 13: Residual = 0.93003
MCSA Iteration 14: Residual = 0.779433
MCSA Iteration 15: Residual = 0.560534
MCSA Iteration 16: Residual = 0.450209
MCSA Iteration 17: Residual = 0.362937
MCSA Iteration 18: Residual = 0.265083
MCSA Iteration 19: Residual = 0.271156
...

MCSA Iteration 86: Residual = 5.25686e-08
MCSA Iteration 87: Residual = 3.68724e-08
MCSA Iteration 88: Residual = 3.19974e-08
MCSA Iteration 89: Residual = 2.4964e-08
MCSA Iteration 90: Residual = 1.91875e-08
MCSA Iteration 91: Residual = 1.62984e-08
MCSA Iteration 92: Residual = 1.24602e-08
MCSA Iteration 93: Residual = 1.04528e-08
MCSA Iteration 94: Residual = 8.81198e-09

real    1m22.929s
user    1m22.801s
sys 0m0.036s

The convergence is improved but the method is still very slow relative to what a Krylov solver would do. Even with these SPD matrices this is a problem. Also to note its not related explicitly to the cross sections of the problem but rather tightly linked to the spectral radius which has more of a dependence on L/dx.

sslattery commented 11 years ago

I've done some more matlab analysis and found that ILU and ILUT preconditioning should improve on the Jacobi preconditioning. For example, zero-fill ILU (ILU(0)) reduced an SPN problem with spectral radius of 0.9867 with Jacobi preconditioning to 0.9293. ILUT did even better. I think I can wrap the right IFPACK components into a Thyra preconditioner inside of MCLS to get a left/right preconditioning strategy based on the LU factorization.

sslattery commented 11 years ago

Here's some more matlab results to further using ILUT. I setup an SPN problem with SPN order 1 and 1 group to give a similar system to that found in problem 3. I then played with the cross sections until I got a point Jacobi preconditioned iteration matrix with a spectral radius of 0.9952, large enough to give me lots of problems. In addition, I implemented a Richardson iteration with relaxation such that we have:

x^(k+1) = x^(k) + w*r^(k)

where r^(k) is the appropriately preconditioned residual (i.e. Jacobi or ILUT) and w is the relaxation parameter. All of the following situations do this iteration with the associated preconditioning.

Here are the number of iterations required to converge:

  1. Point Jacobi with w = 1.0: 2476
  2. Richardson iteration with w = 1.0: does not converge
  3. Point Jacobi with w != 1.0: does not converge
  4. Richardson iteration with w = 0.001: 6132
  5. ILUT with drop tolerance of 1.0e-2 w = 1.0: 209
  6. ILUT with drop tolerance of 1.0e-2 w = 1.69: 120

So not only does ILUT do a lot better but it actually responds to the relaxation parameter in the richardson iteration. I also think my right preconditioning implementation is not correct so I'm going back to check that out.

sslattery commented 11 years ago

I'm renaming this issue and changing the definition as the scope has changed.

sslattery commented 11 years ago

The Ifpack ILUT preconditioner is done and incorporated into the Thyra/Stratimikos infrastructure. I've got a new Richardson solver in MCLS to test out these preconditioners and the results look good for a 1-group SP1 problem 3 discretization (number of DOFs ~ 20,000).

  1. A point Jacobi preconditioned iteration for was taking ~8400 iterations per eigenvalue iteration.
  2. An ILUT preconditioned Richardson iteration with 1.0e-2 drop tolerance, 1.5 level of fill, and Richardson relaxation parameter of 1.0 was down to ~190 iterations per eigenvalue iteration.
  3. For those same ILUT parameters, using a Richardson relaxation parameter of 1.3 reduced the iterations to ~140 per eigenvalue iteration.

As the MCSA method and underlying adjoint solver are fundamentally based on the Richardson iteration, these preconditioning strategies are clearly going to reduce the length of the Markov chains and speed up the calculation. The next step is to verify my MCSA preconditioning strategy and add the relaxation parameter inside of the adjoint solver (this is a simple scaling operation). Once that's done, I'll be able to see if these reactor problems are more manageable.

Keep in mind here these are serial runs. The Ifpack ILUT preconditioner is not a true LU decomposition, it computes a local LU decomposition to eliminate all parallel communication in the construction of the preconditioner and the subsequent action on a vector. This may not be satisfactory for parallel runs or perhaps the drop tolerance and thresholds will have to be relaxed.

sslattery commented 11 years ago

This keeps getting worse...

I was getting some segfaults out of Epetra when I was building the Monte Carlo domain for sampling. I realized that I must be running out of memory on the machine during the import/export operations required to generate overlap and domain boundaries. I then remembered I was fully inverting the LU matrices from the preconditioning to build the resulting operator which led me back to matlab for more analysis. Here's a sparsity plot of the iteration matrix that comes from an ILUT preconditioned Richardson iteration matrix:

ilut_iteration_matrix

No wonder I ran out of memory... Say goodbye to any kind of domain decomposed parallelism as well. ILUT with the Richardson iteration destroys sparsity due to the matrix-matrix multiplications and direct inversions I have to perform because of the need to build probabilities.

This is really getting to be a serious problem. I'm going to have explore some other preconditioners as I have the infrastructure in place to quickly add them, but the resulting composite operators need to be sparse or the above happens. I'm getting close to running out of ideas on preconditioning. If little progress is made I'll have to move in another direction.

sslattery commented 11 years ago

I came up with an idea after rock climbing today to use the ILUT drop tolerance when I construct the inverse of those matrices. Typically, that tolerance is applied after the LU construction, but Ifpack applies it to the matrix before doing the factorization. I figured I could do the same thing when I explicitly form the inverse by ignoring matrix elements that are below that tolerance as their contribution to the matrix-vector product is presumed negligible. This reduced the number of elements in each row of the matrix by an order of magnitude, allowing the computation to proceed although with a loss of monotonicity as observed below (iterations 11-12).

MCSA Iteration 1: Residual = 5.877643e+00
MCSA Iteration 2: Residual = 8.325247e-01
MCSA Iteration 3: Residual = 2.155778e-01
MCSA Iteration 4: Residual = 7.105584e-02
MCSA Iteration 5: Residual = 2.424318e-02
MCSA Iteration 6: Residual = 7.056292e-03
MCSA Iteration 7: Residual = 1.038629e-03
MCSA Iteration 8: Residual = 1.676313e-04
MCSA Iteration 9: Residual = 8.154283e-05
MCSA Iteration 10: Residual = 5.095648e-05
MCSA Iteration 11: Residual = 7.099415e-06
MCSA Iteration 12: Residual = 9.775427e-06
MCSA Iteration 13: Residual = 3.004781e-06
MCSA Iteration 14: Residual = 1.566122e-06
MCSA Iteration 15: Residual = 9.958534e-06
MCSA Iteration 16: Residual = 3.038978e-07
MCSA Iteration 17: Residual = 2.442821e-07
MCSA Iteration 18: Residual = 8.292234e-08
MCSA Iteration 19: Residual = 6.960518e-09
MCLS solver "MCLS::MCSASolverManager<Epetra_Vector, Epetra_RowMatrix>" returned a solve status of "SOLVE_STATUS_CONVERGED" for 1RHSs using 19 cumulative iterations for an average of 19 iterations/RHS

The result of doing this is a convergent method for a serial run with around 18 MCSA iterations needed for the problems I was running above with ~1 history per DOF in the problem. I decoupled the relaxation parameters in the Richardson extrapolation and inside of the adjoint solver so I could test their effects and found that while the outer Richardson extrapolation was unresponsive with 1.0 being the best, using 0.9 for the Monte Carlo solver relaxation when building probabilities and the source term gave about 6 fewer iterations than the 1.0 base case.

The next step is to test this with more energy groups. I may need a parallel run to do this so after I check that out I'll move on to parallel runs as I'm concerned the boundary and overlap generation will be impaired by the still significant number of nonzero entries in the iteration matrix (~2000 for ILUT parameters that give convergence). Furthermore, future timing studies may have to separate the preconditioning step from the actual solve performed by the Monte Carlo kernel as preconditioning was around an order of magnitude slower for these problems.

sslattery commented 11 years ago

As a means of seeing if I could filter even more entries out of the LU inverse matrices to reduce the size of the composite operator, I tested various values of drop tolerance and level of fill with Problem 3. In addition, I enabled the Ifpack drop tolerance to be decoupled from my drop tolerance for the inverse matrix elements to see if that had any effect. In addition, as a metric for preconditioning quality I multiplied the number of MCSA iterations to converge by the maximum number of entries in the composite operator rows. Here are the results for serial runs

ilut_drop_tol

There are a few things to note:

  1. A drop tolerance of higher than 1.0e-3 prevented convergence, meaning that we can't loosen it any further to get this problem to converge, meaning that we need lots of composite matrix entries.
  2. Decoupling the Ifpack drop tolerance from my drop tolerance (inverse tol in the chart) didn't really do anything, behaving in a nearly identical fashion to the coupled case. I was hoping I could give a small drop tolerance and high fill level to get a really good factorization and then filter out the small numbers in the inverse matrices with my tolerance to reduce the number of row entries in the composite operator. If this worked, I could get a decent preconditioner that was more sparse - unfortunately this didn't happen.
  3. This preconditioning does reduce the number of iterations used to a point where the number of MCSA iterations needed to converge are approximately the same as the Krylov method when 1 stochastic history is used per DOF in the problem.
  4. The fill level has a profound effect on convergence, stronger for these problems than the drop tolerance. A higher fill level with a lower drop tolerance will likely perform better than a lower fill level with a higher drop tolerance.
  5. Based on my metric of multiplication, the higher fill level is more effective. This seems to be a good metric because I want to minimize both iterations and composite row entries. The relative magnitude gives some measure of their importance. An unpreconditioned problem has only 7 entires per row - we are getting 1000s here. That has a signficant impact on Monte Carlo parallelism and performance. I'd rather do transport for 30 MCSA iterations on a well conditioned matrix with 100 elements than 10 MCSA iterations on a well conditioned matrix with 2000 elements.

I'm going to move forward running parallel problems here and I'll determine if this preconditioning is feasible for cluster-sized problems.

sslattery commented 11 years ago

Based on the task description, this is done. ILUT reduces the number of markov chains required for convergence well below 1 per DOF and the resulting chains are much shorter. Furthermore, I have done a series of ILUT preconditioned runs to study the behavior relative to the preconditioning parameters that can be tuned. The dense matrices generated by such this preconditioning are a byproduct of this achievement, but studying that problem should be in another issue.

In addition, work in this issue also introduced the concept of relaxation parameters for MCSA as well as studied the sensitivity of ill-conditioned problems to the number of histories per iteration.

I'm giving this to Paul to review and we can discuss the results of the work in this issue.