Closed Yurlungur closed 1 year ago
Still need to test on GPUs, and add documentation, but this basically ready for review.
I'll try to get to reviewing this next week sometime. Let me know if you want a review by a certain date and that might help me prioritize it better
No rush---Daniel and I are now trying to thread it through singularity-eos., But we can just depend on this branch while we do that.
I was initially confused by the hierarchical name (as @jonahm-LANL mentioned) as well. I has thought it was like block AMR, which it is not; it is dimension-by-dimension, and the spacing is arbitrary and not limited to 2:1, for instance. That makes it quite flexible, and it is in line with how some of the incoming data is organized anyway, which is nice.
I've got a few comments about the API:
RegularGrid1D g1(xmin, 0.35 * (xmax - xmin), 3);
RegularGrid1D g2(0.35 * (xmax - xmin), xmax, 4);
HierarchicalGrid1D<NGRIDS> g = {{g1, g2}};
This seems like a reasonable way to define a 1D index space to me.
HierarchicalDB<NGRIDS> db(Spiner::AllocationTarget::Device, NCOARSE, NCOARSE, NCOARSE);
for (int i = 0; i < RANK; ++i) {
db.setRange(i, g);
}
This seems somewhat clunky. If I already have an index space, is it not possible to just use that? If you wanted to make sure the allocation and the actual identification with the grid are separate, then I can understand somewhat, but is that the intention here? Likewise setRange
isn't any longer the name of what is being done; perhaps setIndexSpace
is more clear?
I am also concerned that this part lacks the necessary flexibility. For instance, if I wanted a HeirarchacalGrid1D
with 5 grids in one axis and only 3 in another (or even just a uniform index space in the other), is that possible? I think the code might indicate not. That might be a limitation I'm not happy with; open for debate there.
Perhaps more to the point, the fact that the number of 1d spans in an index space is a template parameter might be a nonstarter. I'm imagining how this might be used downstream. At compile time I don't know what number of regions would be chosen -- that might be read out of an .sp5 file. Likewise in generating a table, I wouldn't want to recompile each time I wanted to try a different number of regions in my index space. And then, even if interpolation has the same API as before, now the object itself probably calls for a kind of type erasure. It feels like that shouldn't be necessary, but I'm happy to discuss.
This seems somewhat clunky. If I already have an index space, is it not possible to just use that? If you wanted to make sure the allocation and the actual identification with the grid are separate, then I can understand somewhat, but is that the intention here?
This kind of initialization is not really necessary. It's just setting up the grids "in-place" so-to-speak.
Perhaps more to the point, the fact that the number of 1d spans in an index space is a template parameter might be a nonstarter.
It's not a hard limit. Just a first pass. The maximum number of grids is required to be a template parameter. But the actual number can be relaxed to be any value below the maximum.
Also thanks for digging into this @chadmeyer !
@chadmeyer I'd like to discuss your use cases (probably offline). They may be sufficiently different from what offered with this capability. This is meant to be very lightweight when used with everything known at compile time, in fact I'd say that if one were envisioning more than 5-10 domains, this is probably not suited to your use case. I can currently state without reservation that the current implementation has O(1) complexity for a search with a small coefficient multiplier near unity.
I could envision less performance focused tools that could be fully dynamic and used to inform what values are set in production to produce a given error metric, etc, but I do not want to go down the fully dynamic AMR path for this PR. The amount of plumbing required to enable dynamic memory alone ... I'm very against it but I could be convinced otherwise if I am missing something truly magical.
I could envision less performance focused tools that could be fully dynamic and used to inform what values are set in production to produce a given error metric, etc, but I do not want to go down the fully dynamic AMR path for this PR. The amount of plumbing required to enable dynamic memory alone ... I'm very against it but I could be convinced otherwise if I am missing something truly magical.
I think we'll have to see the accuracy numbers. My thinking is that we need something that can resolve kinks and phase boundaries that are not square regions in density-temperature space in order to really make spiner a viable option for accurate sesame data lookups without a rational function interpolator. I could be wrong, but that's why I think it will all be in the numbers.
I think we'll have to see the accuracy numbers. My thinking is that we need something that can resolve kinks and phase boundaries that are not square regions in density-temperature space in order to really make spiner a viable option for accurate sesame data lookups without a rational function interpolator. I could be wrong, but that's why I think it will all be in the numbers.
I think this is a problem that can be solved with resolution. The sense that I'm getting from your comment @jhp-lanl is that the spiner eos has never been expected by ASC-IC to be sufficiently accurate for their purposes. This is news to me. If this is the case, I'd much rather implement a rational polynomial interpolant (anything that improves arithmetic intensity is going to be good for our performance metrics anyway) than the amount of complexity added with arbitrary or adaptive grids.
On a broader note, we need a discussion of how this may be improved to meet the needs of interested parties, but I am going to be pretty firm on not allowing the scope of this PR to creep to perfection. We can't let the perfect be the enemy of the good. We are currently over-resolving to a very high degree most of the EOS domain to get semi-reasonable results for the most complex (and relatively localized) regions and the memory overhead is large and can be alleviated to in large part with this capability and we should not lose sight of that. This being said, I am not the physics expert here and so if I am missing something major we need to talk about it which is why I suggest a meeting with interested parties.
I think we'll have to see the accuracy numbers. My thinking is that we need something that can resolve kinks and phase boundaries that are not square regions in density-temperature space in order to really make spiner a viable option for accurate sesame data lookups without a rational function interpolator. I could be wrong, but that's why I think it will all be in the numbers.
@jhp-lanl correct me if I'm wrong but I was under the impression that many sesame tables had exactly this "rectangular region" structure, the sesame grids are also Cartesian product topology.
Although I agree threading things through singularity eos is really the demonstration of value, this should be an improvement over the current approach.
I don't currently have a way to go beyond Cartesian product topology, but I would resist building that until we know it's needed for our use cases.
Note that one could still play the games that eospac plays with Cartesian product topology, such as offsetting the cold curve in energy space, to produce grid lines that conform to the cold curve, even with structured grids like these.
Let me try to make clear(er) my use case here; I think @dholladay00 is fearing something worse than I think we need. In fact, if @jonahm-LANL's suggestion that the template parameters could be maxima, that might of itself cover the problem, but I do think it needs more focused thinking. To be clear, I'm not actually suggesting nested grids in multi-D are necessary or useful, but that's what my mind went to with the name.
I'm going to focus on EOS tables for the moment. Noting that we are making a trade-off between memory and performance and accuracy, different materials (and, just as importantly, different users/use cases) will have different requirements for their material. There might be a material that has two important phase transition regions that need to be captured to get the behavior of the material correct. It might be the case that they are sufficiently separated that to grid the region between them at the finest accuracy would be more expensive in memory than preferred; thus the arrangement in one of the dimensions of gridding might be CFMFC (C=coarse, F=fine, M=medium). For another material, there might just be one region you want to highlight (CFC), or even just the temperature axis rather than the density axis. This exercise must be done per material (and different legitimate choices could be made).
My concern is that I would need to know what type of thing I'm going to generate at compile time (at least how many grids something has per dimension). Then, a HierarchicalDB<3>
is not the same object as a HierarchicalDB<5>
or a HierarchicalDB<1>
or a DataBox
for that matter. If that were a "hard" compile time paramater, the complexity of supporting it in an EOS downstream would seem to be quite high (again, I don't know for sure).
@dholladay00, I don't know if I understand your statement that it could be O(1) but not suited for even 5 regions (or more) of different resolution. Do you mean O(Ngrids)? I'm also not sure I understand what you mean about dynamic memory. Is it that much of a concern for the number of grids to be known only at runtime?
I'm not saying that it has to be perfect, but I'm struggling to see a use case outside of one-off testing.
Sorry I should clarify. My comment was not meant to be blocking. Let's get this in and test it and see where things need to go from there.
I'm going to focus on EOS tables for the moment. Noting that we are making a trade-off between memory and performance and accuracy, different materials (and, just as importantly, different users/use cases) will have different requirements for their material. There might be a material that has two important phase transition regions that need to be captured to get the behavior of the material correct. It might be the case that they are sufficiently separated that to grid the region between them at the finest accuracy would be more expensive in memory than preferred; thus the arrangement in one of the dimensions of gridding might be CFMFC (C=coarse, F=fine, M=medium). For another material, there might just be one region you want to highlight (CFC), or even just the temperature axis rather than the density axis. This exercise must be done per material (and different legitimate choices could be made).
This is a fair concern. I have an additional comment below, but first I wanted to note. that it's possible to have the axes be different already in the current design
My concern is that I would need to know what type of thing I'm going to generate at compile time (at least how many grids something has per dimension). Then, a
HierarchicalDB<3>
is not the same object as aHierarchicalDB<5>
or aHierarchicalDB<1>
or aDataBox
for that matter. If that were a "hard" compile time paramater, the complexity of supporting it in an EOS downstream would seem to be quite high (again, I don't know for sure).@dholladay00, I don't know if I understand your statement that it could be O(1) but not suited for even 5 regions (or more) of different resolution. Do you mean O(Ngrids)? I'm also not sure I understand what you mean about dynamic memory. Is it that much of a concern for the number of grids to be known only at runtime?
So I just re-skimmed the diff---I think changing the number of grids to a runtime parameter is actually very simple--simple enough I could probably add it before this gets merged if you feel sufficiently strongly. The reason @dholladay00 and I went with the current design was a few reasons:
PORTABLE_MALLOC
. And static arrays are easier in this case when the total number of patches is assumed to be small.I'm not saying that it has to be perfect, but I'm struggling to see a use case outside of one-off testing.
Why do you say that? Modulo perhaps changing the compile-time parameter to a MAX number of grids as opposed to an exact number, I think this is a fairly complete first pass at doing what you're describing.
@dholladay00, I don't know if I understand your statement that it could be O(1) but not suited for even 5 regions (or more) of different resolution. Do you mean O(Ngrids)? I'm also not sure I understand what you mean about dynamic memory. Is it that much of a concern for the number of grids to be known only at runtime?
@chadmeyer when N
is compile time known, then it's basically a multiplicative constant (in this case log(N)
which should be <4) which still means O(1) since everything is known at compile time.
@Yurlungur do Databox
s have a get on device thing? If so, maybe runtime could be handled somewhat more easily than I was thinking. I always worry about internal heap allocations, but maybe it could be done in a reasonable way if all of that complex memory management infrastructure is baked in
Modulo perhaps changing the compile-time parameter to a MAX number of grids as opposed to an exact number, I think this is a fairly complete first pass at doing what you're describing.
Mostly that. If the number of grids per-dimension-per-table is fixed, then using it in, say, spinerEOS
will be problematic for the reasons given above. If it is a maximum with runtime-known size (with the only real difference being the memory overhead of the object via the size of the private members):
RegularGrid1D<T> grids_[NGRIDS];
int pointTotals_[NGRIDS];
then, that's probably an acceptable case. However, given that the allocation of the grids mentioned here only will happen at some problem set-up phase (and not per time-step, for instance), I question whether the template parameter approach is entirely useful. It may be that having the grids on heap rather than stack would make a difference; I just don't know.
From this perspective, the ability to specify per-databox-per-dimension kinds/numbers of grids at runtime seems like an essential feature before this could be adopted downstream for most applications I can think of. Other details or niceties are less essential.
@chadmeyer @dholladay00 @jhp-lanl I have modified the PR based on your feedback and made the following changes:
PiecewiseGrid1D
instead of Hierarchical
.I question whether the template parameter approach is entirely useful. It may be that having the grids on heap rather than stack would make a difference; I just don't know.
I left the maximum value as a template parameter---mainly because I do not want to have to manage dynamic memory on device for such a low-level feature. Using an array of some fixed maximum size is a much easier solution. If it turns out to be an impediment, we can move to dynamic memory later with no change in visible, downstream functionality.
I believe this PR is now ready for merge. Those of you interested, please re-review before I click the button.
I think this covers at least a minimum usage to at least try out the capability. I don't see any blockers at this point.
Thanks @chadmeyer !
@jhp-lanl do you want to look through it more or should I click the button?
Thanks @chadmeyer !
@jhp-lanl do you want to look through it more or should I click the button?
Yes. I'm responding. I think you have some errors in that above expression but I need to make sure
Thanks for your catch on the constructor conditional @jhp-lanl . Were there other comments you wanted to make or should I merge after tests pass?
Tests triggered on re-git
Tests pass on re-git
Thanks for your catch on the constructor conditional @jhp-lanl . Were there other comments you wanted to make or should I merge after tests pass?
Merge away
PR Summary
PR co-written with @dholladay00 . This PR adds a sort of patch-based static mesh refinement to spiner grids. We now introduce a new type---
HierarchicalGrid1D
which can be built from a set ofRegularGrid1D
s that each intersect at exactly one point. Interpolation then works exactly as before, assuming theDataBox
is filled appropriately For example:The name
HierarchicalGrid
is a bit of a misnomer. These grids have 1 level of hierarchy, as they're a piecewise collection of grids. So a better name might bePiecewiseGrid
. Interested to hear if anyone has any opinions on the name of this new machinery.PR Checklist