Open paulromano opened 10 years ago
As @nhorelik points out, there is no natural way to tally with lattice structures. This refactor should also address #215.
It's also not possible to use multiple mesh filters. A refactor should have some generalization for meshes (particularly of different types).
I would also like to be able to tally responses with microscopic cross sections that weren't loaded for the materials - e.g. tallying a U235 reaction rate in instrument tubes that contain only air for a material. This might require some special cross section loading routines, where for instance in many cases we only need a certain MT number and not the whole set.
@bhermanmit and I were discussing that since it looks like there are a lot of different things that various people want to see come out of this refactor - and more than one person seem to be ready to work on it currently - it would probably be helpful to have a more unified understanding of what the overarching picture will look like. By this I mean some high-level and intermediate-level flow charts demonstrating the desired flow, what each subroutine should be trying to accomplish, etc. Bryan just shared a sharelatex document to us for this purpose.
Good idea guys!
That would be good, can I get in on the ShareLatex doc? I have an account under my umich email address.
Hey, everyone. I'm working on adding perturbations, and that means a lot of interaction with the tally module. Can you guys add me to the ShareLatex doc? I don't want to duplicate work or diverge from your efforts.
@smharper You should touch base with @bhermanmit as he has been doing all the development on the tally refactor.
@smharper - the tally refactor effort has kind of slowed and needs help from other developers. Besides refactoring the tallies, the branch also refactors where saved variables are located to avoid dependency issues. The framework that I have developed seems to be promising and I have implemented a few scores and filters including tracklength, analog and collision estimators. The following tasks still need to be finished:
I want to make the three following changes to tallies. Let me know if any of you have thoughts on these.
First, I want to add functional expansion elements. These will handle our PN and YN expansions now, they are might also appropriate for the @nelsonag's NDPP tallies, and I hope that they will handle @mellis13's spatial expansions in the future. This means that rather than a user specifying a scatter-P3
score, they would specify a P3
functional expansion element and import it into a tally with a scatter
score.
<!-- Old: -->
<tally id="1" scores="scatter-Y3 flux-Y3" />
<!-- New: -->
<functional_expansion id="1" type="spherical_harmonics" order="3" />
<tally id="1" scores="scatter flux" expansion="1" />
These are the advantages that I think functional expansion elements will give over just more scores:
functional_expansion
types rather than score_general
).score-FN
style is pretty bulky.Second, I would also like to separate filters from tally elements.
<!-- Old: -->
<tally id="1" scores="flux total" >
<filter type="energy" bins="0.0 6.25E-7 20.0" />
</tally>
<!-- New: -->
<filter id="1" type="energy" bins="0.0 6.25E-7 20.0" />
<tally id="1" scores="flux total" filters="1" />
Finally, I want to move to tally elements that only have one score. I imagine that most of the reason we have many scores per tally now is so users can reuse filter elements. But that won't be necessary if I make the above change of separating filters from tally elements. Furthermore, if I also implement the functional expansion element it will reduce the number of scores you can lump together anyway (for example, can no longer specify nu-fission scatter-p3
).
<!-- Old: -->
<tally id="1" scores="flux total" />
<!-- New: -->
<tally id="1" score="flux" />
<tally id="2" score="total" />
Put it all together and we have something that looks like
<filter id="1" type="energy" bins="0.0 6.25E-7 20.0" />
<filter id="2" type="energyout" bins="0.0 6.25E-7 20.0" />
<filter id="3" type="distribcell" bins="11" />
<functional_expansion type="spherical_harmonics" order="3" />
<tally id="1" score="flux" filters="1 3" />
<tally id="2" score="total" filters="1 3" />
<tally id="3" score="scatter" filters="1 2 3" expansion="1" />
Thoughts?
You propose some pretty cool ideas @smharper! The functional expansions are not something that I personally make much use of right now, but I can certainly see the value of separating out the functional order into its own entity in the input. The idea of separate and "shared" filter elements is one that I have long thought would simplify the inputs greatly, esp for some of us who use lots of tallies.
I think you make some good points about the scores, but I'm less sold on this account. My intuition tells me that by aggregating the data that is scored to - i.e., with includes and scores as the innermost dimensions for tally results - the cache efficiency will be better than splitting the arrays up for separate tallies. In addition, the issue I tried to address with mergeable tallies in the Python API in #383 would be exacerbated by splitting scores across tallies, unless you have plans for a clever data structure to enable much better than O(N) lookups for tallies at runtime. On Aug 2, 2015 6:56 PM, "Sterling Harper" notifications@github.com wrote:
I want to make the three following changes to tallies. Let me know if any of you have thoughts on these. Functional expansion elements
First, I want to add functional expansion elements. These will handle our PN and YN expansions now, they are might also appropriate for the @nelsonag https://github.com/nelsonag's NDPP tallies, and I hope that they will handle @mellis13 https://github.com/mellis13's spatial expansions in the future. This means that rather than a user specifying a scatter-P3 score, they would specify a P3 functional expansion element and import it into a tally with a scatter score.
These are the advantages that I think functional expansion elements will give over just more scores:
- We'll be able to shorten the current list of scores, and we'll keep it from growing huge as we add more expansion types. (Imagine what it would look like if you had to add scores for all permutations of x-order, y-order, z-order for a spatial Legendre expansion, for example)
- The same expansion can be used for multiple tallies. That means a user will only have to change one element, the expansion, in order to adjust the order for all of their tallies.
- It will allow us to simplify the tally.F90 logic (expansion logic would be placed inside type-bound methods for functional_expansion types rather than score_general).
- It will simplify input_xml.F90 because the current logic for reading the score-FN style is pretty bulky.
Filter elements
Second, I would also like to separate filters from tally elements.
- The same filter can be used for multiple tallies. Again this makes it much faster to adjust many tallies simultaneously. - This will really shorten tally.xml files when a user wants to apply fine energy structure filters to many tallies. One score per tally Finally, I want to move to tally elements that only have one score. I imagine that most of the reason we have many scores per tally now is so users can reuse filter elements. But that won't be necessary if I make the above change of separating filters from tally elements. Furthermore, if I also implement the functional expansion element it will reduce the number of scores you can lump together anyway (for example, can no longer specify nu-fission scatter-p3). - It prevents users from running into errors with tracklength and analog-only tallies. - I think it's also a little easier for new users to understand. - It will simplify our tally.F90 loops and data structures a bit since we can use 2D rather than 3D arrays. The 2D arrays will also probably make data processing in the PythonAPI a little easier. The tallies.xml of the future Put it all together and we have something that looks like Thoughts? — Reply to this email directly or view it on GitHub https://github.com/mit-crpg/openmc/issues/214#issuecomment-127097842.
@smharper this is looking real nice. I agree with @wbinventor's suggestion about not requiring one score per tally, even though your idea does result in much cleaner code.
Two more thoughts:
Thanks!
@wbinventor, I was operating under the assumption that the mergeable tally speedup you saw was due to more efficiency in the filter indexing (as in, you removed redundant filter indexings, which I also hope to do with the shared filters, btw), but I see now that there might be a cache issue on score dimension of the arrays. After I implement the filter thing, I would like to run some tests to see if the cache efficiency is really present and significant.
@nelsonag, I like the idea of builtin filter structures. We could make them a separate element type like
<easy_filter id=1 type="energy" name="helios47" />
or maybe just overload the bins element
<filter id=1 type="energy" bins="helios47" />
I'm not sure what you me an by "response function". Can you give me an example of something that is a response function but not an expansion?
I'm not sure what you me an by "response function". Can you give me an example of something that is a response function but not an expansion?
Sure, by response function I mean some general function which takes particle attributes as input and produces some output. The functional expansions you mention fall under this category. However, NDPP tallies are technically not functional expansions, at least of the data known by the Monte Carlo simulation at tally time, but are response functions.
@smharper the cache efficiency is related to your idea of single-score nuclides. The impetus for mergeable tallies is that the at each event, the tally system currently searches each and every tally to see if it is relevant to the event (e.g., contains a filter for the same cell as the particle current location). Hence, the time it takes to tally each event scales with the total number of tallies in the simulation. By aggregating or merging the scores into fewer tallies, the overall tally time can be made many times faster for a simulation with lots of tallied quantities, since the total search space is greatly reduced.
On a separate note, I like @nelsonag's idea of "simplified tallies." But I think that the focus of a tally refactor should be the development of efficient object-oriented and polymorphic data structures to avoid the explosive use of "if-else" and/or switch case blocks to accommodate new scores and filter types. IMHO the goal of the tally refactor should not be the implementation of new, cool filter types but to create the platform necessary for developers to easily "plug-n-play" new filters/scores into the tally system. On Aug 3, 2015 4:24 PM, "Adam Nelson" notifications@github.com wrote:
I'm not sure what you me an by "response function". Can you give me an example of something that is a response function but not an expansion?
Sure, by response function I mean some general function which takes particle attributes as input and produces some output. The functional expansions you mention fall under this category. However, NDPP tallies are technically not functional expansions, at least of the data known by the Monte Carlo simulation at tally time, but are response functions.
— Reply to this email directly or view it on GitHub https://github.com/mit-crpg/openmc/issues/214#issuecomment-127430353.
@smharper I thought a little bit about what you alluded to in your last comment about the possibility of searching each filter for once, rather than searching each filter in each tally. I think this would (largely if not entirely) mitigate the mergeable tally issue. The cache efficiency with scores is likely less of a big deal, though it might become increasingly important on many-core machines. @paulromano and @jtramm would know more given their multi-core performance analysis of OpenMC. It would also be worthwhile to hear @nhorelik's thoughts on your proposed ideas and how they would interact with the different approaches he has taken to tallying massive (TB+) datasets with domain decomposition - i.e., tally "scrambling" and on-the-fly tally allocation.
For starters, I like the functional expansion concept, and agree it would greatly simplify things. Having unique filters which are associated with tallies also seems like a good way to go. It will certainly require a bit of restructuring of the code, but that's what we're talking about anyway :) As far as functional expansion versus response, I have a little trouble seeing how a response in general is different from a score. @nelsonag maybe you can elaborate on this topic some more.
I am very hesitant about the idea of a single score for each tally however. Having multiple scores in a single tally is definitely beneficial for cache re-use since accesses are linear in memory. Such a memory access pattern is only possible because multiple scores are being scored to in the same array. As far as multi-core performance and how this would affect it, I'd say it largely depends on the problem, what tallies are being used, and how many cores you're using, but in general splitting scores over multiple tallies means that you end up with more dirty cache lines, and consequently more misses due to cache line invalidation. One of the main target problems I always have in mind is a hypothetical depletion simulation where you would want to know ~6 scores in ~300 nuclides in millions of fuel regions. @smharper you ought to mention how you see <nuclide>
fitting in since you didn't mention it in your original proposal.
@paulromano, I was planning to initially leave the <nuclide>
input syntax and the basic location of the nuclide loop/data where it is now. But in the long term, I am considering putting that functionality into an expansion (maybe it's an abuse of the term "functional expansion"... but delta functions are still functions).
Since we are talking about things like cache efficiency, I suppose it will be beneficial to share the scoring algorithm that I had in mind. Let me iron out one kind of confusing piece now. I imagine that we will (eventually) want to allow multiple expansions per tally. This means we'll have to score all of the combinations of all of the coefficients in all of the expansions. Here's one algorithm that could do that (although in practice I would use something with more vector operations): [REDACTED see comment below]
Let's stuff that snippet into a type bound method like tally % get_expansion_mask(p)
. Using that, here's a rough-out of how I would write the scoring loop:
loop over all filters
filter % matching_bin = filter % get_scoring_bin(p)
end filter loop
loop over tallies
flux = tally % flux_estimator % get_flux()
if flux == 0
cycle tally loop
end if
filter_index = 1
loop over tally % filters
if filter % matching_bin == NO_BIN_FOUND
cycle tally loop
end if
filter_index += filter % stride * filter % matching_bin
end tally % filters loop
mask = tally % get_expansion_mask
loop over tally % nuclides
score = tally % get_score(p, nuclide)
t % results % value += score * mask
end tally % nuclides loop
Some things to note:
flux == 0
. I am expecting that flux_estimator % get_flux
will return zero for an analog absorption tally that is called after a particle is scattered, for example.I suppose when I envisioned this, I had really nasty expansions in mind (combinations of 3D spatial and angular expansions, maybe with NDPP tacked on), and I was worried about their memory efficiency over reaction type scores.
Since there are legitimate concerns about the reaction type scoring cache-efficiency, I think I'll add the slight modification to the above picture of making tally % get_score
return an array with one entry per reaction type. Then score * mask
can be a col_vec * row_vec. That will also mean getting rid of the flux == 0
idea and replacing it with something inside that nuclide loop.
@paulromano:
As far as functional expansion versus response, I have a little trouble seeing how a response in general is different from a score. @nelsonag maybe you can elaborate on this topic some more.
It isn't really, a score is a response function; however, a score is also a functional expansion. So my point (slightly self-serving due to the long term goal of including NDPP tallies, though I bring it up because I hope there are other examples, perhaps KDEs) is merely that 'functional expansions' might not be the correct terminology for how we may end up using this in practice. I differentiate between functional expansions and the NDPP tallies because technically, the NDPP data is not a functional expansion of what is happening in the MC simulation, i.e., it is not a functional expansion of behavior of the as-simulated neutrons, but it is instead a prediction of a functional expansion, if that makes ANY sense.
Either way, you are right, the "response function" terminology isn't very useful either since that is technically what a score is for.
It looks like I was a little hasty when I typed out that expansion algorithm. Here's one that will actually work:
mask(1 : n_combinations) = 1
loop over expansions in tally
stride = expansion % stride
max_order = expansion % max_order
for i = 1, n_combinations
order = (i modulo stride) / max_order ! floor division
mask(i) = mask(i) * expansion % get_coeff(p, order)
end for
end expansion loop
I was thinking about the functional expansion vs response function too
generally. In this case
It looks like I was a little hasty when I typed out that expansion algorithm. Here's one that will actually work:
mask(1 : n_combinations) = 1 loop over expansions in tally stride = expansion % stride max_order = expansion % max_order for i = 1, n_combinations order = (i modulo stride) / max_order ! floor division mask(i) = mask(i) * expansion % get_coeff(p, order) end for end expansion loop
— Reply to this email directly or view it on GitHub https://github.com/mit-crpg/openmc/issues/214#issuecomment-127816535.
Upon further thought, I would like to retract the idea of functional_expansion
s and replace them with scoring_functions
which can also handle reaction scores.
<scoring_function id="1" type="reaction" reactions="scatter nu-scatter" />
<scoring_function id="2" type="reaction" reactions="nu-fission" />
<scoring_function id="3" type="spherical_harmonics" order=3 />
<scoring_function id="4" type="scatter_legendre" order=3 />
<scoring_function id="5" type="ndpp_energyout" bins="0.0 6.25E-7 20.0" />
<tally id="1" scoring_functions="3 4 5 1" />
<tally id="2" scoring_functions="5 2" nuclides="U-235 Pu-238" />
The cool thing about this is that you can pick which order you want your data to be stored in and looped over without worrying about if it is a reaction type, angular expansion, set of delayed groups, or whatever.
@smharper That's an interesting idea. Can you sketch out pseudocode for how the scoring would work similar to what you showed before? And presumably there will be some restrictions on what combinations of scoring functions are acceptable?
Here's an abstract base class for ScoringFunction
:
type, abstract :: ScoringFunction
integer :: n_bins
real(8), allocatable :: score(:)
contains
procedure(compute_score_sf_), deferred :: compute_score
end type ScoringFunction
subroutine compute_score_sf_(this, p, i_nuclide)
import ScoringFunction
class(ScoringFunction), intent(inout) :: this
class(Particle), intent(in) :: p
integer, intent(in) :: i_nuclide
end subroutine compute_score_sf_
So now that you have an idea how it would work, here are two examples of possible subclasses:
type, extends(ScoringFunction) :: ReactionSF
integer :: reactions(:)
contains
procedure :: compute_score => compute_score_rxn
end type
subroutine compute_score_rxn(this, p, i_nuclide)
type(ReactionSF), intent(inout) :: this
class(Paritcle), intent(in) :: p
integer, intent(in) :: i_nuclide
logical :: material_contains_nuclide
real(8) :: atom_density
integer :: i_nuclide, i_bin
score = ZERO
material_contains_nuclide = ! Insert logic to see if i_nuclide is present
if (.not. material_contains_nuclide) return
atom_density = ! Atom density calculation goes here
do i_bin = 1, this % n_bins
select case this % reactions(i_bin)
case (RXN_FLUX)
score(i_bin) = ONE
case (RXN_TOTAL)
if (i_nuclide > 0) then
score(i_bin) = micro_xs(i_nuclide) % total * atom_density
else
score(i_bin) = material_xs % total
case (RXN_SCATTER)
...
! This is where most of our big select case(score_bin) block goes
end select
end do
end subroutine compute_score_rxn
type, extends(ScoringFunction) :: SphericalHarmonicsSF
contains
procedure :: compute_score => compute_score_pn
end type
subroutine compute_score_pn(this, p, i_nuclide)
type(SphericalHarmonicsSF), intent(inout) :: this
class(Particle), intent(in) :: p
integer, intent(in) :: i_nuclide
integer :: order
do order = 0, this % n_bins - 1
score(order + 1) = calc_pn(order, p % mu)
end do
end subroutine compute_score_pn
Each tally will have an array of ScoringFunction
s like those. Notice that every ScoringFunction
's score
is an array (in general). To build the entire score for one tally event, we need to get all the combinations of bin indexes for these different scores. For example, if we have three scoring functions with 2, 3, and 2 bins, respectively, then the entire score array will have 12 bins like this:
score( 1) = scorefuns(1, 1, 1)
score( 2) = scorefuns(2, 1, 1)
score( 3) = scorefuns(1, 2, 1)
score( 4) = scorefuns(2, 2, 1)
score( 5) = scorefuns(1, 3, 1)
score( 6) = scorefuns(2, 3, 1)
score( 7) = scorefuns(1, 1, 2)
score( 8) = scorefuns(2, 1, 2)
score( 9) = scorefuns(1, 2, 2)
score(10) = scorefuns(2, 2, 2)
score(11) = scorefuns(1, 3, 2)
score(12) = scorefuns(2, 3, 2)
To make that pattern, the tally will assign a chunk_size
and stride
to each of it's scoring functions. chunk_size
refers to how many times each scoring function's bins are repeated consecutively. So in the above example, the scoring functions have a chunk size of 1, 2, and 6, respectively. stride
refers to the distance in the score array before a particular scoring function repeats it's sequence. In the example, the strides are 2, 6, and 12, respectively. Here's how a tally could use that information to build up a score:
subroutine build_score(this, p, i_nuclide)
type(Tally), intent(in) :: this
class(Particle), intent(in) :: p
integer, intent(in) :: i_nuclide
integer :: i_sf, i_bin, i_chunk ! Loop iterators
integer :: n_bins, chunk_size, stride ! score_fun characteristics
integer :: first
class(ScoringFunction), pointer :: current_sf
this % score = ONE ! Note that this % score is a vector.
do i_sf = 1, this % n_scoring_funs
! Bookkeeping.
current_sf => scoring_funs(i_sf)
n_bins = current_sf % n_bins
stride = this % scorefun_stride(i_sf)
chunk_size = this % scorefun_chunk(i_sf)
! Evaluate the current ScoringFunction.
call current_sf % compute_score(p, i_nuclide)
! Apply the current ScoringFunction's response to the score for this event.
do i_bin = 1, n_bins
do i_chunk = 1, chunk_size
first = (i_bin - 1) * chunk_size + i_chunk
this % score(first : : stride) *= current_sf % score(i_bin)
end do
end do
end do
end subroutine build_score
When a collision happens, we'll call score
on the relevant tallies. That subroutine will look something like this:
subroutine score(this, p)
type(Tally), intent(in) :: this
class(Particle), intent(in) :: p
real(8) :: flux
integer :: filter_index, i, i_nuclide
flux = this % flux_estimator % get_flux()
if flux == ZERO return
filter_index = ! More code goes here
do i = 1, this % n_nuclide_bins
i_nuclide = t % nuclide_bins(i)
call this % build_score(p, i_nuclide)
end do
this % results(:, filter_index) = this % score
end subroutine score
Notice that Tally % build_score
and ScoringFunction % compute_score
are subroutines, not functions. I would like to have made those two functions that returned arrays, but I think that will require us to regularly allocate arrays. So instead, they are subroutines that write into member arrays which only have to be allocated once at initialization.
It would be cool if there could be a NuclideSF
instead of enforcing the location of the nuclide loop in Tally % score
, but we would have to figure out how to couple it to ReactionSF
. I'm sure we could do that, but I'll leave that for later.
@paulromano, I can't think of any restrictions we would have to apply right now.
@smharper Thanks a lot for that sketch. Definitely helps to see a rough idea of where everything would lay, and I think I understand it for the most part. As you said, there wouldn't be any restrictions on combinations of scoring functions, so in theory a user could tally very exotic quantities if they so desired. The abstraction is beautiful, certainly better than I would have come up with on my own. I think we'll just have to be careful about performance, particularly with memory copies. I see you're already thinking about that aspect (subroutine vs function returning array) which is great. Let us know if you have a tentative idea of when you're planning on taking a crack at an actual implementation and if you'd like/need help from others.
@smharper from what I've seen, I would agree that this sketch looks really good. I need to look at it in more detail, esp. since the tally system is very important to my work. One question I do have is this - if we're going to all this work to do an OO refactorization, why the switch-case
for the score types when computing tally scores? Why not subclass wth finer granularity down to the fundamental score types themselves? I'd be curious if this would have any impact on performance for problems with lots of tallies and lots of scores (perhaps not and branch prediction is smarter than I realize). It just seems to me that the subclasses should be as slim as possible - just ripe for the taking to be used as examples for any developer to come along and create a new score subclass.
@paulromano, I don't know when I'll actually implement this. I hope to whittle away at it and have at least a partial implementation by the start of next summer. Right now, #438 and diff tallies (gotta do a thesis sometime, ya know) are a little higher priority. As for getting help with this, I would appreciate help but I'm not sure yet how to break this change down into small tasks so that different people could work together. I'll think about it some and let you guys know if I come up with a plan for divide and conquer.
@wbinventor, I think that replacing that switch-case
might be a case of getting too trigger happy with object-oriented design. OO has it's downsides:
It usually requires many more lines of code (especially in Fortran), which hurts readability. For example, you only have to make one definition of real(8) :: atom_density
to use it in many case
statements, but if you use OO procedures, you end up having to repeat that definition over, and over again.
It takes a block of code and scatters it out across several different parts of several different files (especially in Fortran), which hurts readability. With select-case
, all the code is in one big monolithic block. If you replace that with reaction % get_score
and a developer wants to know what that does on the inside, then they'll probably have to go to a different file like tally_header.F90 or reaction_header.F90; in that file, they'll find the definition of class :: Reaction
at the top which gives them the deferred get_score
function. The definition of that get_score
function is probably near the Reaction
class, but separate in its own abstract interface. Then the developer has to find the type, extends(Reaction) :: FissionReaction
which will point to a procedure that is again, in a different part of the F90 file.
It really does hurt performance. Check out this simple comparison I threw together. When I call run_tests.sh
, I see that the OO version takes about 3.5 s and the procedural version takes 3.2 s. That's small, but it gets worse if you go to real conditions where the objects will be in a different file. Move the funct_header
module into it's own file, and turn on optimization in run_tests.sh
(change -g
to -O3
, don't forget to add funct_header.F90 to the argument list) and I see the more dramatic difference of 1.7s vs 1.0s. (I assume this isn't really a fault of OO itself, but the lack of link-time optimization when you move objects to different files. For a more extreme example of this difference in optimization, remove the write statement and run the test. The single-file version will be whittled down to nothing in the dead code removal and it will run instantly.). Also, look at the size of the binaries. The OO version is a few kB bigger. In that tarball, I included the objdump
for each binary. You can look at that to understand why we get a little worse performance with OO. Here's a section of the OO objdump with some junk removed:
ptr => idf
40094a: mov QWORD PTR [rbp-0x18],0x400ba0
400952: lea rax,[rbp-0x4]
400956: mov QWORD PTR [rbp-0x20],rax
y = ptr % compute(x)
40095a: mov rax,QWORD PTR [rbp-0x18]
40095e: mov rax,QWORD PTR [rax+0x20]
400962: lea rcx,[rbp-0x28]
400966: lea rdx,[rbp-0x20]
40096a: mov rsi,rcx
40096d: mov rdi,rdx
400970: call rax
There we can see a hint of how polymorphism really works. Look at line 94a. We are actually calling that method at location 970 with call rax
. But what is rax
? From locations 95e, 95a, and 94a, we can see that rax
is really 0x400ba0 + 0x20
. If you look further down in that file, you can see that the other calls to compute
boil down to call 0x400c20 + 0x20
and call 0x400be0 + 0x20
. Here's what I think is going on: 0x400ba0
, 0x400c20
, and 0x400be0
contain pointers to our types, IdentityFunct
, SquareFunct
, and PolyFunct
. Since they are all Class(Funct)
, we know they contain at compute
method 0x20
after the beginning of their data. What's more, I doubt the actual implementation of compute
can be found at 0x20
; 0x20
is probably itself a pointer to the actual implementation of compute
(otherwise, we wouldn't now where to put the definitions for other methods of Class(Funct)
.
So you see, when we call ptr % compute(x)
, we have to travel down a long chain of pointers. It's ptr-to-variable-in-stack ==> ptr-to-compute-function => ptr-to-compute-implementation. And this isn't cache efficient either because the main loop, the object, and the procedure implementation are in three different places in the code (very similar to how the fortran source code looks, really). Contrast this with the procedural objdump:
select case(f_type)
40082d: mov rax,QWORD PTR [rbp-0x20]
400831: mov eax,DWORD PTR [rax]
400833: cmp eax,0x2
400836: je 400853 <compute_+0x36>
400838: cmp eax,0x3
40083b: je 400866 <compute_+0x49>
40083d: cmp eax,0x1
400840: jne 4008c7 <compute_+0xaa>
...
y = compute(x, 1)
400900: lea rax,[rbp-0x10]
400904: mov esi,0x400ac0
400909: mov rdi,rax
40090c: call 40081d <compute_>
Here we know the exact location of compute
at compile time, hence call 40081d
. Then, when we look at the select case
statement, we see a distinct lack of pointer nonsense. Before, we had to go find a pointer at 0x400ba0 + 0x20
and then follow that to the implementation. Now, we know at compile time that we simply have to jump to 40081d
which will quickly tell us to jump to 400840
.
Hmm, I had more to say on that topic than I thought I did. Anyway, OO gives us a lot of advantages when we need a deep tree of inheritance, or if you have a bunch of different functions that are really complex. But the reaction scores are not really complex. They are little snippets of code, and because of this I think we will get better readability and better performance by leaving them in select case
statements.
Addendum: It's also possible that the performance differences that I mentioned will actually be unnoticeable in a real OpenMC simulation. And compilers can also remove the differences under the right circumstances (my previous example runs at the same speed with -O3 if you keep the OO module with it's program, for example). So we should probably try abstracting away the select case
and look at the performance empirically, but I think that experiment should be a low-priority because of my previous arguments.
@smharper if you had spent all the time you used to so thoroughly debunk my bad OO idea, we would have a refactored tally module already!
You raise some great points. I definitely agree that breaking up the switch-case
would likely lead to a larger codebase, and one which might be more convoluted to follow. These reasons alone might be enough to keep the switch-case
around.
I'm less convinced by the performance arguments, re your addendum above. For large simulations where performance matters most, tallies will have more than one bin. The pointers to link to the subroutine implementation may be cache misses when scoring to the first bin, but will be readily accessible in L1 for each successive bin. This is just my intuition, I haven't taken the time to do any experiments as you have done, so this may be a poor analysis.
Finally, I certainly don't know if there would be any performance improvement by breaking away from the switch-case
. I do think that avoiding conditionals, we might open the door to a little SIMD vectorization when scoring tallies in the future. Speaking of which, regardless of how the refactor is implemented, we should consider how to structure the algorithm to score to all tally bins in a more SIMD-like fashion.
@wbinventor, yeah, maybe I got a bit carried away there. I wrote that sentence "It really does hurt performance," and then I realized that I was just repeating things I had heard previously without knowing why they are true. So next thing I know I'm working my way through an objdump
. Anyway, it was a fun exercise. 8/10, would waste time on it again.
Re: vectorization, Yes, definitely something to look into.
My gut instinct is that OO will hurt peformance at that fine-grained a level, but like @smharper said it's just heresay and I don't have any numbers to back it up. I'm sure all of us will be interested to see hard evidence one way or the other. It's probably also very compiler-dependent at this point.
Vectorization is a tougher nut to crack and one that would probably require wholesale restructuring of much of the code... so I wouldn't overly worry about it for now. No area in the code currently takes advantage of vectorization so we have a long way to go.
@smharper Quick question on the tally refactor. With the refactorization how hard do you think it would be to compute a covariance matrix for FET coefficients? Eventually FET coefficients are used to reconstruct a tallied quantity at a point, and therefore the only way to get non-approximate uncertainty on the quantity at a point is to have the covariance matrix. Probably less of an issue, but maybe it would also be useful for @wbinventor tally arithmetic if covariances would easily be calculated (feel free to say if it is not really needed). Thanks for doing all of this work, @smharper !
@mellis13, sadly my statistics knowledge is failing me here. I don't know how difficult it is. Let's talk about this some in person in a couple weeks.
Sounds good! Thanks @smharper
One thought I had as I was looking through the tally module the other day -- right now some filter logic uses attributes of the tally directly, e.g. if the tally uses an analog estimator, the filter does something different than if the estimator were tracklength or collision. If we really want to make filters separable from tallies, we'll need to remove that dependence somehow. Not sure if you've already thought about this @smharper.
Just wanted to give an update for anyone following this issue. We now have filters implemented as their own objects thanks to the excellent work of @smharper and @amandalund. This summer, @LMKerby and her student Brittany Grayson are looking into implementing functional expansions as a new filter type. After some discussion, we decided that this was a simpler path forward than introducing a separate abstraction for expansions. The existing 'expansion-type' scores (like scatter-Pn
) will be replaced by using, in this case, something like LegendreAngleFilter
along with just a regular scatter
score. The same can be done for spherical harmonics expansions. Beyond replacing those scores, we're also planning on adding filters for functional expansions in space (Legendre, Zernike) that can be used for coupling.
Thank you Paul.
Leslie M. Kerby, Ph.D., M.B.A. (208) 533-8118 Assistant Professor Nuclear Engineering and Health Physics Idaho State University
Research Scientist Nuclear Systems Design and Analysis Division Idaho National Laboratory
On Jun 29, 2017 2:50 PM, "Paul Romano" notifications@github.com wrote:
Just wanted to give an update for anyone following this issue. We now have filters implemented as their own objects thanks to the excellent work of @smharper https://github.com/smharper and @amandalund https://github.com/amandalund. This summer, @LMKerby https://github.com/lmkerby and her student Brittany Grayson are looking into implementing functional expansions as a new filter type. After some discussion, we decided that this was a simpler path forward than introducing a separate abstraction for expansions. The existing 'expansion-type' scores (like scatter-Pn) will be replaced by using, in this case, something like LegendreAngleFilter along with just a regular scatter score. The same can be done for spherical harmonics expansions. Beyond replacing those scores, we're also planning on adding filters for functional expansions in space (Legendre, Zernike) that can be used for coupling.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mit-crpg/openmc/issues/214#issuecomment-312103995, or mute the thread https://github.com/notifications/unsubscribe-auth/AQoh8e7CHTXw0QG3H5Xsx8HN4Z-bBYvOks5sJA4BgaJpZM4A_5T0 .
Now that a decent fraction of the code has been refactored or rewritten, I think it's time we focus on the tally module. There are numerous problems with the tally module:
select case
statements for scoring binsNow that the physics module has been split into
physics
andtracking
, we can call create fake particles when tallying to get post-collision particle attributes even when no scattering/fission event happens. This allows us to truly implement a collision estimator and also allows us to use certain filters/scores with a tracklength estimator that we couldn't before.