Closed meverson closed 10 years ago
Also, I had to modify the _source cutoff when calculating the residual. When I switched to the 361 group problems I've been working on, many of the sources fell below this even though they contributed to the problem. Therefore, I divided the initial cutoff by the number of groups so this changes with the number of groups accordingly. Seems to have fixed the issue for me.
I incorporated some of these changes into the latest commit for CPUSolver and VectorizedSolver, which are the parents of ThreadPrivateSolver and VectorizedPrivateSolver. The GPUSolver class implements a slightly different version of computeFSRSources which leverages the thrust library reduction routine at the expense of a larger memory footprint.
I reduced the size of "_scatter_sources" to "_num_threads * _num_groups" from "_num_FSRs * _num_groups". In addition, I reduced the size of "_source_residuals" to "_num_FSRs" from "_num_FSRs * _num_groups".
As for the the index arithmetic operations, I'm counting on the compiler to recognize this and optimize it for me so I don't have to clutter the code with pre-computing indices.
commit (master): 95a6efebae202f452d0e54df06cbd576901acc1e
Here's my modified version of computeFSRSources. It was originally intended to access only the nonzeros in the scattering matrix for each material but this proved to be too time consuming, oddly enough. This was the reason behind using index, n_mat and nnzs. It is now simplified to only calculate scattering below a cutoff representing the highest incoming group which is scattered. This cutoff is stored in nnzs as the code loops through all outgoing groups.
Also, four different index variables are introduced to reduce the total number of multiplications that have to be done within each loop to retrieve information for various quantities.
Let me know if you have any questions, or see any possible problems with this code. Thanks!
FP_PRECISION CPUSolver::computeFSRSourcesNew() {
}