Open sslattery opened 11 years ago
In addition to parallel parameters, I forgot to add that we should also be looking at solver parameters here including weight cutoff and number of histories per MCSA iteration.
First thing to do here is define a good problem to test this out with. My first thoughts are to do an infinite medium problem of reasonable size. For cross sections, perhaps we should consider a reactor/assembly-type group structure with homogenization so we know what to expect for real problems.
Any suggestions Tom?
So I've pulled Tom's 3x3 pin problem from Orthanc and will start there as the character of the linear operator will be in line with that observed for other reactor problems. Reflecting and vaccumm boundaries are used here on 3 sides to effectively construct 1/8 of the problem. It has UO2 with cladding and moderator and 7 energy groups with upscatter and fission. I can mesh it pretty fine to test scaling, which will work well as many Jacobi blocks must be inverted for the preconditioning. I'll run it on dionysus next to verify.
Stuart,
We have some full core problems with real cross section we can run when you're ready.
Tom
On Wed, Mar 20, 2013 at 4:59 PM, Stuart Slattery notifications@github.comwrote:
First thing to do here is define a good problem to test this out with. My first thoughts are to do an infinite medium problem of reasonable size. For cross sections, perhaps we should consider a reactor/assembly-type group structure with homogenization so we know what to expect for real problems.
Any suggestions Tom?
— Reply to this email directly or view it on GitHubhttps://github.com/sslattery/Chimera/issues/28#issuecomment-15203091 .
Ok. I'm going to need a day or two with these assembly problems to make sure all types of MSOD parallelism are functioning correctly through the Thyra interfaces. Its easy to do without Thyra but might require some more intrusive code in Denovo to get the communicator structure correct. Once I determine if that's necessary I'll be ready to merge my branch.
So I ran into this:
Denovo - pyspn Python Front-End
-------------------------------
Release : denovo-3_0_0
Release Date : 3-MAR-2011
Build Date : 20130321
Traceback (most recent call last):
File "fs_mcls.py", line 412, in <module>
(r, z, mixids, cleanids, table) = build_mesh(10)
File "fs_mcls.py", line 209, in build_mesh
uo2_pin = Pincell()
File "/local.hd/cnergg/stuart/builds/Exnihilo/release/python/denovo_utils.py", line 1136, in __init__
this = _denovo_utils.new_Pincell(*args)
NotImplementedError: Wrong number or type of arguments for overloaded function 'new_Pincell'.
Possible C/C++ prototypes are:
rtk::PyPincell< rtk::PyCyl_Pin_Geom >::PyPincell(rtk::PyCyl_Pin_Geom const &,rtk::PyRTK_Tag const &,denovo::PyCompositions const &,double,double)
rtk::PyPincell< rtk::PyCyl_Pin_Geom >::PyPincell(rtk::PyRTK_Tag const &,denovo::PyComposition const &,double,double)
Looks like the RTK code base has changed since Tom ran these problems. I'll have to figure out how to update the python input.
It looks like the assembly inputs are out of date with respect to RTK definitions for pincells and material compositions.
Tom - Do you have a more up-to-date input for these spn assembly problems that are compatible with the current Exnihilo master branch?
Stuart,
Let me get in touch with you this afternoon.
Tom
On Thu, Mar 21, 2013 at 12:49 PM, Stuart Slattery notifications@github.comwrote:
It looks like the assembly inputs are out of date with respect to RTK definitions for pincells and material compositions.
Tom - Do you have a more up-to-date input for these spn assembly problems that are compatible with the current Exnihilo master branch?
— Reply to this email directly or view it on GitHubhttps://github.com/sslattery/Chimera/issues/28#issuecomment-15250668 .
So it looks like I can go through the pyneutronics interface in the Insilico package to run SPn with RTK geometry. The issue here is that Scale (Fulcrum) is a required dependency for this build and makes this much more complicated than I would like at this point. I might just have to make my own problems for now so I can get something working with the software/machines available to me.
Stuart,
Sorry, had to work on some emergencies yesterday. I will call to help you with this later today.
Tom
On Fri, Mar 22, 2013 at 10:31 AM, Stuart Slattery notifications@github.comwrote:
So it looks like I can go through the pyneutronics interface in the Insilico package to run SPn with RTK geometry. The issue here is that Scale (Fulcrum) is a required dependency for this build and makes this much more complicated than I would like at this point. I might just have to make my own problems for now so I can get something working with the software/machines available to me.
— Reply to this email directly or view it on GitHubhttps://github.com/sslattery/Chimera/issues/28#issuecomment-15299587 .
That sounds good. Thanks Tom.
So I decided to just bite the bullet on this one and get a Scale 6.1 license so I can build. I'm working with Mark Baird to get that setup. Once I have that I'll build the pyneutronics front-end and run through the AMA benchmark xml inputs and cross sections Steven Hamilton gave me. There's too much good stuff here not to do the Scale build and it will help me in the long run. This probably means no more laptop builds but those were getting pretty epic as it was.
Everything is building now. Here's the group 19 neutron flux at the centerline of benchmark 3 - a 17x17 pin PWR assembly criticality calculation.
I solved this problem with Aztec GMRES with ILUT preconditioning. I'm having some issues with MCSA convergence and I think it might be the adjoint solver giving a bad correction. I'm looking at it now.
Is it just failing to converge?
I'm glad u got scale installed, it will make life much easier to set these up.
Tom Evans (865)748-8190 Sent from my iPhone
On Mar 27, 2013, at 3:00 PM, Stuart Slattery notifications@github.com wrote:
[image: g19p3]https://f.cloud.github.com/assets/755901/309983/71d0aa64-9710-11e2-9c19-fd341289e90d.png
I solved this problem with Aztec GMRES with ILUT preconditioning. I'm having some issues with MCSA convergence and I think it might be the adjoint solver giving a bad correction. I'm looking at it now.
— Reply to this email directly or view it on GitHubhttps://github.com/sslattery/Chimera/issues/28#issuecomment-15546224 .
Yeah the residual is blowing up on the first eigenvalue iteration:
MCSA Iteration 1: Residual = 2.285915e+05
MCSA Iteration 2: Residual = 1.573952e+06
MCSA Iteration 3: Residual = 3.018761e+08
MCSA Iteration 4: Residual = 1.225985e+10
MCSA Iteration 5: Residual = 1.829143e+11
MCSA Iteration 6: Residual = 1.979053e+12
MCSA Iteration 7: Residual = 6.646306e+13
MCSA Iteration 8: Residual = 1.085218e+15
MCSA Iteration 9: Residual = 1.195397e+16
MCSA Iteration 10: Residual = 7.100922e+17
The fact that the adjoint method is running probably means that the preconditioner is working. If the spectral radius of the iteration matrix is greater than 1 then the history weights go infinite. I just realized though that this build had MCLS DBC turned off. I'm going to turn it back on and see what I get out. There may some numerical issues somewhere and DBC will determine if the histories are misbehaving.
DBC didn't pick up anything. I might try just an adjoint solve next.
So it may just be that this problem is really ill-conditioned. Check out the residuals from a block-Jacobi preconditioned Richardson iteration:
MCSA Iteration 1: Residual = 9.394369e-01
MCSA Iteration 2: Residual = 8.848969e-01
MCSA Iteration 3: Residual = 8.770492e-01
MCSA Iteration 4: Residual = 8.400525e-01
MCSA Iteration 5: Residual = 8.333852e-01
MCSA Iteration 6: Residual = 8.052318e-01
MCSA Iteration 7: Residual = 7.991999e-01
MCSA Iteration 8: Residual = 7.781818e-01
MCSA Iteration 9: Residual = 7.725512e-01
MCSA Iteration 10: Residual = 7.563541e-01
MCSA Iteration 11: Residual = 7.510019e-01
MCSA Iteration 12: Residual = 7.381150e-01
MCSA Iteration 13: Residual = 7.329699e-01
MCSA Iteration 14: Residual = 7.224718e-01
MCSA Iteration 15: Residual = 7.174891e-01
MCSA Iteration 16: Residual = 7.087863e-01
MCSA Iteration 17: Residual = 7.039364e-01
MCSA Iteration 18: Residual = 6.966177e-01
MCSA Iteration 19: Residual = 6.918797e-01
MCSA Iteration 20: Residual = 6.856443e-01
MCSA Iteration 21: Residual = 6.810032e-01
MCSA Iteration 22: Residual = 6.756243e-01
MCSA Iteration 23: Residual = 6.710688e-01
MCSA Iteration 24: Residual = 6.663732e-01
MCSA Iteration 25: Residual = 6.620045e-01
The spectral radius here is probably about 0.99 based on these residuals. Typically when I see the MCSA iteration behavior I posted above that means not enough histories were used at each iteration to get a good correction. In my past work, a good number of histories was usually around the number of DOFs in the problem. I've more than doubled that number in these MCSA runs for this problem with no luck. I wonder what would happen if I point Jacobi precondition after I've block preconditioned?
To note it says MCSA Iteration
above because I just quickly modified that solver to do a Richardson iteration.
Tom - any idea why cross section generation would hang when SPN block matrices are enabled for problem 3? In general I've been seeing weird behavior with the linear operator coming out of SPN (so far only the CrsMatrix type is getting the the eigenvalue iterations). I'm still debugging so nothing conclusive yet.
So I take that last one back - looks like it runs just fine with SPN block matrices and an Aztec solve. I think there are some issues here with my implementation in MCLS. I'm kind of surprised by this because of all the unit testing but the poor convergence, this block matrix issue, and noted parallel issues on 2 cores mean I've got some things to work out.
Stuart,
The block matrix and crs matrix implementations should give the same results. Have tried doing a a 1-group collapse and solve. That should work out the issues between block and point matrices.
Tom
Sent from my iPad
On Mar 28, 2013, at 11:36 AM, Stuart Slattery notifications@github.com wrote:
So I take that last one back - looks like it runs just fine with SPN block matrices and an Aztec solve. I think there are some issues here with my implementation in MCLS. I'm kind of surprised by this because of all the unit testing but the poor convergence, this block matrix issue, and noted parallel issues on 2 cores mean I've got some things to work out.
— Reply to this email directly or view it on GitHubhttps://github.com/sslattery/Chimera/issues/28#issuecomment-15595227 .
So I've decided that the VbrMatrix implementation isn't hanging - my block preconditioner is REALLY inefficient in its construction routine because I am using a row matrix interface. I'm going to refactor this first.
This is why we implemented the crs matrix. ML uses row matrix as well which means tons of copying.
Sent from my iPad
On Mar 28, 2013, at 1:19 PM, Stuart Slattery notifications@github.com wrote:
So I've decided that the VbrMatrix implementation isn't hanging - my block preconditioner is REALLY inefficient in its construction routine because I am using a row matrix interface. I'm going to refactor this first.
— Reply to this email directly or view it on GitHubhttps://github.com/sslattery/Chimera/issues/28#issuecomment-15601696 .
Yeah even with my refactor which significantly reduced the number of row matrix interface calls (more than an order of magnitude) this is still taking forever. I ran my simple test problem I used for spectral analysis and turned off the block matrices and observed convergent behavior. So I don't think the matrix form is an issue here.
I'm moving on to analyzing just the behavior of the adjoint solver. I'm getting this from my DBC in a parallel problem:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
C++ Standard Library Assertion: MCLS Assertion: DT::isLocalState(*d_domain,starting_state), failed in /filespace/people/s/stuart/local_hd/software/Exnihilo/MCLS/src/MCLS_UniformAdjointSource_impl.hpp, line 343.
Basically what that means is that I'm generating a history from the source on proc 0 in a state that exists on a different proc in the linear operator. That implies that the right-hand side of the linear system has a different Epetra_Map than the linear operator. I'll have to take a look at the SPN source code in Exnihilo to see how the parallel decomposition is formed.
So I've verified that the linear operator, left hand side, and right hand side have different decompositions/orderings. I'm going to make everything consistent with the operator through import/export operations. This type of thing would clearly mess up the calculation as you could transport through the domain with the wrong source and tally into the wrong vector - everything is simply wrong. This is not a bug in Denovo - simply a super general use case I did not anticipate or write unit tests for. I've fixed the right hand side consistency. Left hand side is next and a little trickier. I'll report when done.
I've fixed left hand consistency but the convergence issues are persisting. I'll have to do some more debugging.
As I was solving a 1-group SP1 form of problem 3, I'm actually solving the diffusion equation. I went back to my simple 2D diffusion implementation in Chimera for an easier place to study the convergence issues. I've made some interesting observations:
As an example, here is the iteration output from a 150x150 grid 2D diffusion problem using 100,000 histories per MCSA iteration to get convergence. Note I stopped the iterations due to time but the residual is monotonically decreasing.
Diffusion Length 3.65148
Mesh Size 0.666667
L/dx 5.47723
SPECTRAL RADIUS: 0.989844
MCSA Iteration 1: Residual = 13.1622
MCSA Iteration 2: Residual = 10.8821
MCSA Iteration 3: Residual = 8.42188
MCSA Iteration 4: Residual = 6.98505
MCSA Iteration 5: Residual = 6.88249
MCSA Iteration 6: Residual = 4.394
MCSA Iteration 7: Residual = 3.61359
...
MCSA Iteration 24: Residual = 0.0836128
MCSA Iteration 25: Residual = 0.0695528
MCSA Iteration 26: Residual = 0.0627831
MCSA Iteration 27: Residual = 0.0457332
MCSA Iteration 28: Residual = 0.0403989
MCSA Iteration 29: Residual = 0.0265606
MCSA Iteration 30: Residual = 0.023427
MCSA Iteration 31: Residual = 0.0200454
MCSA Iteration 32: Residual = 0.0190552
MCSA Iteration 33: Residual = 0.0126856
MCSA Iteration 34: Residual = 0.0108319
^C
real 7m13.640s
user 7m13.115s
sys 0m0.152s
Now lets relax the number of histories to ~1 per DOF for a total of 25,000 and observe the loss of convergence:
Diffusion Length 4.08248
Mesh Size 0.666667
L/dx 6.12372
SPECTRAL RADIUS: 0.991808
MCSA Iteration 1: Residual = 28.3203
MCSA Iteration 2: Residual = 51.5849
MCSA Iteration 3: Residual = 92.1227
MCSA Iteration 4: Residual = 166.191
MCSA Iteration 5: Residual = 296.01
MCSA Iteration 6: Residual = 613.562
MCSA Iteration 7: Residual = 989.295
MCSA Iteration 8: Residual = 1756.76
MCSA Iteration 9: Residual = 3530.85
MCSA Iteration 10: Residual = 5084.65
MCSA Iteration 11: Residual = 9013.4
MCSA Iteration 12: Residual = 17575.1
MCSA Iteration 13: Residual = 29198
MCSA Iteration 14: Residual = 63189.8
MCSA Iteration 15: Residual = 91088.1
MCSA Iteration 16: Residual = 180757
^C
real 1m2.939s
user 1m2.852s
sys 0m0.028s
Here is the same grid with 25,000 histories per iteration and the cross sections tweaked to get a slightly smaller spectral radius:
Diffusion Length 1.74078
Mesh Size 0.666667
L/dx 2.61116
SPECTRAL RADIUS: 0.957608
MCSA Iteration 1: Residual = 12.813
MCSA Iteration 2: Residual = 12.7237
MCSA Iteration 3: Residual = 10.3598
MCSA Iteration 4: Residual = 6.73363
MCSA Iteration 5: Residual = 5.34828
MCSA Iteration 6: Residual = 4.79887
MCSA Iteration 7: Residual = 3.46504
MCSA Iteration 8: Residual = 2.68539
MCSA Iteration 9: Residual = 2.54452
MCSA Iteration 10: Residual = 1.65355
MCSA Iteration 11: Residual = 1.51612
MCSA Iteration 12: Residual = 1.11646
MCSA Iteration 13: Residual = 0.93003
MCSA Iteration 14: Residual = 0.779433
MCSA Iteration 15: Residual = 0.560534
MCSA Iteration 16: Residual = 0.450209
MCSA Iteration 17: Residual = 0.362937
MCSA Iteration 18: Residual = 0.265083
MCSA Iteration 19: Residual = 0.271156
...
MCSA Iteration 86: Residual = 5.25686e-08
MCSA Iteration 87: Residual = 3.68724e-08
MCSA Iteration 88: Residual = 3.19974e-08
MCSA Iteration 89: Residual = 2.4964e-08
MCSA Iteration 90: Residual = 1.91875e-08
MCSA Iteration 91: Residual = 1.62984e-08
MCSA Iteration 92: Residual = 1.24602e-08
MCSA Iteration 93: Residual = 1.04528e-08
MCSA Iteration 94: Residual = 8.81198e-09
real 1m22.929s
user 1m22.801s
sys 0m0.036s
The convergence is improved but the method is still very slow relative to what a Krylov solver would do. Even with these SPD matrices this is a problem. Also to note its not related explicitly to the cross sections of the problem but rather tightly linked to the spectral radius which has more of a dependence on L/dx.
I've done some more matlab analysis and found that ILU and ILUT preconditioning should improve on the Jacobi preconditioning. For example, zero-fill ILU (ILU(0)) reduced an SPN problem with spectral radius of 0.9867 with Jacobi preconditioning to 0.9293. ILUT did even better. I think I can wrap the right IFPACK components into a Thyra preconditioner inside of MCLS to get a left/right preconditioning strategy based on the LU factorization.
Here's some more matlab results to further using ILUT. I setup an SPN problem with SPN order 1 and 1 group to give a similar system to that found in problem 3. I then played with the cross sections until I got a point Jacobi preconditioned iteration matrix with a spectral radius of 0.9952, large enough to give me lots of problems. In addition, I implemented a Richardson iteration with relaxation such that we have:
x^(k+1) = x^(k) + w*r^(k)
where r^(k)
is the appropriately preconditioned residual (i.e. Jacobi or ILUT) and w
is the relaxation parameter. All of the following situations do this iteration with the associated preconditioning.
Here are the number of iterations required to converge:
w = 1.0
: 2476w = 1.0
: does not convergew != 1.0
: does not convergew = 0.001
: 6132w = 1.0
: 209w = 1.69
: 120So not only does ILUT do a lot better but it actually responds to the relaxation parameter in the richardson iteration. I also think my right preconditioning implementation is not correct so I'm going back to check that out.
I'm renaming this issue and changing the definition as the scope has changed.
The Ifpack ILUT preconditioner is done and incorporated into the Thyra/Stratimikos infrastructure. I've got a new Richardson solver in MCLS to test out these preconditioners and the results look good for a 1-group SP1 problem 3 discretization (number of DOFs ~ 20,000).
As the MCSA method and underlying adjoint solver are fundamentally based on the Richardson iteration, these preconditioning strategies are clearly going to reduce the length of the Markov chains and speed up the calculation. The next step is to verify my MCSA preconditioning strategy and add the relaxation parameter inside of the adjoint solver (this is a simple scaling operation). Once that's done, I'll be able to see if these reactor problems are more manageable.
Keep in mind here these are serial runs. The Ifpack ILUT preconditioner is not a true LU decomposition, it computes a local LU decomposition to eliminate all parallel communication in the construction of the preconditioner and the subsequent action on a vector. This may not be satisfactory for parallel runs or perhaps the drop tolerance and thresholds will have to be relaxed.
This keeps getting worse...
I was getting some segfaults out of Epetra when I was building the Monte Carlo domain for sampling. I realized that I must be running out of memory on the machine during the import/export operations required to generate overlap and domain boundaries. I then remembered I was fully inverting the LU matrices from the preconditioning to build the resulting operator which led me back to matlab for more analysis. Here's a sparsity plot of the iteration matrix that comes from an ILUT preconditioned Richardson iteration matrix:
No wonder I ran out of memory... Say goodbye to any kind of domain decomposed parallelism as well. ILUT with the Richardson iteration destroys sparsity due to the matrix-matrix multiplications and direct inversions I have to perform because of the need to build probabilities.
This is really getting to be a serious problem. I'm going to have explore some other preconditioners as I have the infrastructure in place to quickly add them, but the resulting composite operators need to be sparse or the above happens. I'm getting close to running out of ideas on preconditioning. If little progress is made I'll have to move in another direction.
I came up with an idea after rock climbing today to use the ILUT drop tolerance when I construct the inverse of those matrices. Typically, that tolerance is applied after the LU construction, but Ifpack applies it to the matrix before doing the factorization. I figured I could do the same thing when I explicitly form the inverse by ignoring matrix elements that are below that tolerance as their contribution to the matrix-vector product is presumed negligible. This reduced the number of elements in each row of the matrix by an order of magnitude, allowing the computation to proceed although with a loss of monotonicity as observed below (iterations 11-12).
MCSA Iteration 1: Residual = 5.877643e+00
MCSA Iteration 2: Residual = 8.325247e-01
MCSA Iteration 3: Residual = 2.155778e-01
MCSA Iteration 4: Residual = 7.105584e-02
MCSA Iteration 5: Residual = 2.424318e-02
MCSA Iteration 6: Residual = 7.056292e-03
MCSA Iteration 7: Residual = 1.038629e-03
MCSA Iteration 8: Residual = 1.676313e-04
MCSA Iteration 9: Residual = 8.154283e-05
MCSA Iteration 10: Residual = 5.095648e-05
MCSA Iteration 11: Residual = 7.099415e-06
MCSA Iteration 12: Residual = 9.775427e-06
MCSA Iteration 13: Residual = 3.004781e-06
MCSA Iteration 14: Residual = 1.566122e-06
MCSA Iteration 15: Residual = 9.958534e-06
MCSA Iteration 16: Residual = 3.038978e-07
MCSA Iteration 17: Residual = 2.442821e-07
MCSA Iteration 18: Residual = 8.292234e-08
MCSA Iteration 19: Residual = 6.960518e-09
MCLS solver "MCLS::MCSASolverManager<Epetra_Vector, Epetra_RowMatrix>" returned a solve status of "SOLVE_STATUS_CONVERGED" for 1RHSs using 19 cumulative iterations for an average of 19 iterations/RHS
The result of doing this is a convergent method for a serial run with around 18 MCSA iterations needed for the problems I was running above with ~1 history per DOF in the problem. I decoupled the relaxation parameters in the Richardson extrapolation and inside of the adjoint solver so I could test their effects and found that while the outer Richardson extrapolation was unresponsive with 1.0 being the best, using 0.9 for the Monte Carlo solver relaxation when building probabilities and the source term gave about 6 fewer iterations than the 1.0 base case.
The next step is to test this with more energy groups. I may need a parallel run to do this so after I check that out I'll move on to parallel runs as I'm concerned the boundary and overlap generation will be impaired by the still significant number of nonzero entries in the iteration matrix (~2000 for ILUT parameters that give convergence). Furthermore, future timing studies may have to separate the preconditioning step from the actual solve performed by the Monte Carlo kernel as preconditioning was around an order of magnitude slower for these problems.
As a means of seeing if I could filter even more entries out of the LU inverse matrices to reduce the size of the composite operator, I tested various values of drop tolerance and level of fill with Problem 3. In addition, I enabled the Ifpack drop tolerance to be decoupled from my drop tolerance for the inverse matrix elements to see if that had any effect. In addition, as a metric for preconditioning quality I multiplied the number of MCSA iterations to converge by the maximum number of entries in the composite operator rows. Here are the results for serial runs
There are a few things to note:
I'm going to move forward running parallel problems here and I'll determine if this preconditioning is feasible for cluster-sized problems.
Based on the task description, this is done. ILUT reduces the number of markov chains required for convergence well below 1 per DOF and the resulting chains are much shorter. Furthermore, I have done a series of ILUT preconditioned runs to study the behavior relative to the preconditioning parameters that can be tuned. The dense matrices generated by such this preconditioning are a byproduct of this achievement, but studying that problem should be in another issue.
In addition, work in this issue also introduced the concept of relaxation parameters for MCSA as well as studied the sensitivity of ill-conditioned problems to the number of histories per iteration.
I'm giving this to Paul to review and we can discuss the results of the work in this issue.
The convergence behavior of the SPN equations for Reactor problems has been observed to be poor for the block Jacobi preconditioning strategy studied. Given real reactor problems, the goal of this task is to find a preconditioning strategy that yields fewer iterations and therefore shorter Markov chains.