lanl / spiner

Performance portable routines for generic, tabulated, multi-dimensional data
https://lanl.github.io/spiner
BSD 3-Clause "New" or "Revised" License
7 stars 3 forks source link

hierarchical grids #70

Closed Yurlungur closed 1 year ago

Yurlungur commented 1 year ago

PR Summary

PR co-written with @dholladay00 . This PR adds a sort of patch-based static mesh refinement to spiner grids. We now introduce a new type---HierarchicalGrid1D which can be built from a set of RegularGrid1Ds that each intersect at exactly one point. Interpolation then works exactly as before, assuming the DataBox is filled appropriately For example:

    // Build a hierarchical grid made up of two grids, with a coarse and fine region
    constexpr int NGRIDS = 2;
    constexpr Real xmin = 0;
    constexpr Real xmax = 1;

    RegularGrid1D g1(xmin, 0.35 * (xmax - xmin), 3);
    RegularGrid1D g2(0.35 * (xmax - xmin), xmax, 4);
    HierarchicalGrid1D<NGRIDS> g = {{g1, g2}};

    const int NCOARSE = g.nPoints();

   constexpr int RANK = 3;
   HierarchicalDB<NGRIDS> db(Spiner::AllocationTarget::Device, NCOARSE, NCOARSE, NCOARSE);
   for (int i = 0; i < RANK; ++i) {
     db.setRange(i, g);
   }
   portableFor(
     "Fill 3D Databox", 0, NCOARSE, 0, NCOARSE, 0, NCOARSE,
     PORTABLE_LAMBDA(const int iz, const int iy, const int ix) {
     Real x = g.x(ix);
     Real y = g.x(iy);
     Real z = g.x(iz);
     db(iz, iy, ix) = linearFunction(z, y, x);
   });
   constexpr int NFINE = 21;
   portableFor(
       "Interpolate 3D databox", 0, NFINE, 0, NFINE, 0, NFINE,
       PORTABLE_LAMBDA(const int iz, const int iy, const int ix) {
       RegularGrid1D gfine(xmin, xmax, NFINE);
       Real x = gfine.x(ix);
       Real y = gfine.x(iy);
       Real z = gfine.x(iz);
       Real f_true = linearFunction(z, y, x);
       Real difference = db.interpToReal(z, y, x) - f_true;
     });

The name HierarchicalGrid is a bit of a misnomer. These grids have 1 level of hierarchy, as they're a piecewise collection of grids. So a better name might be PiecewiseGrid. Interested to hear if anyone has any opinions on the name of this new machinery.

PR Checklist

Yurlungur commented 1 year ago

Still need to test on GPUs, and add documentation, but this basically ready for review.

jhp-lanl commented 1 year ago

I'll try to get to reviewing this next week sometime. Let me know if you want a review by a certain date and that might help me prioritize it better

Yurlungur commented 1 year ago

No rush---Daniel and I are now trying to thread it through singularity-eos., But we can just depend on this branch while we do that.

chadmeyer commented 1 year ago

I was initially confused by the hierarchical name (as @jonahm-LANL mentioned) as well. I has thought it was like block AMR, which it is not; it is dimension-by-dimension, and the spacing is arbitrary and not limited to 2:1, for instance. That makes it quite flexible, and it is in line with how some of the incoming data is organized anyway, which is nice.

I've got a few comments about the API:

    RegularGrid1D g1(xmin, 0.35 * (xmax - xmin), 3);
    RegularGrid1D g2(0.35 * (xmax - xmin), xmax, 4);
    HierarchicalGrid1D<NGRIDS> g = {{g1, g2}};

This seems like a reasonable way to define a 1D index space to me.

   HierarchicalDB<NGRIDS> db(Spiner::AllocationTarget::Device, NCOARSE, NCOARSE, NCOARSE);
   for (int i = 0; i < RANK; ++i) {
     db.setRange(i, g);
   }

This seems somewhat clunky. If I already have an index space, is it not possible to just use that? If you wanted to make sure the allocation and the actual identification with the grid are separate, then I can understand somewhat, but is that the intention here? Likewise setRange isn't any longer the name of what is being done; perhaps setIndexSpace is more clear?
I am also concerned that this part lacks the necessary flexibility. For instance, if I wanted a HeirarchacalGrid1D with 5 grids in one axis and only 3 in another (or even just a uniform index space in the other), is that possible? I think the code might indicate not. That might be a limitation I'm not happy with; open for debate there.

Perhaps more to the point, the fact that the number of 1d spans in an index space is a template parameter might be a nonstarter. I'm imagining how this might be used downstream. At compile time I don't know what number of regions would be chosen -- that might be read out of an .sp5 file. Likewise in generating a table, I wouldn't want to recompile each time I wanted to try a different number of regions in my index space. And then, even if interpolation has the same API as before, now the object itself probably calls for a kind of type erasure. It feels like that shouldn't be necessary, but I'm happy to discuss.

Yurlungur commented 1 year ago

This seems somewhat clunky. If I already have an index space, is it not possible to just use that? If you wanted to make sure the allocation and the actual identification with the grid are separate, then I can understand somewhat, but is that the intention here?

This kind of initialization is not really necessary. It's just setting up the grids "in-place" so-to-speak.

Perhaps more to the point, the fact that the number of 1d spans in an index space is a template parameter might be a nonstarter.

It's not a hard limit. Just a first pass. The maximum number of grids is required to be a template parameter. But the actual number can be relaxed to be any value below the maximum.

Yurlungur commented 1 year ago

Also thanks for digging into this @chadmeyer !

dholladay00 commented 1 year ago

@chadmeyer I'd like to discuss your use cases (probably offline). They may be sufficiently different from what offered with this capability. This is meant to be very lightweight when used with everything known at compile time, in fact I'd say that if one were envisioning more than 5-10 domains, this is probably not suited to your use case. I can currently state without reservation that the current implementation has O(1) complexity for a search with a small coefficient multiplier near unity.

I could envision less performance focused tools that could be fully dynamic and used to inform what values are set in production to produce a given error metric, etc, but I do not want to go down the fully dynamic AMR path for this PR. The amount of plumbing required to enable dynamic memory alone ... I'm very against it but I could be convinced otherwise if I am missing something truly magical.

jhp-lanl commented 1 year ago

I could envision less performance focused tools that could be fully dynamic and used to inform what values are set in production to produce a given error metric, etc, but I do not want to go down the fully dynamic AMR path for this PR. The amount of plumbing required to enable dynamic memory alone ... I'm very against it but I could be convinced otherwise if I am missing something truly magical.

I think we'll have to see the accuracy numbers. My thinking is that we need something that can resolve kinks and phase boundaries that are not square regions in density-temperature space in order to really make spiner a viable option for accurate sesame data lookups without a rational function interpolator. I could be wrong, but that's why I think it will all be in the numbers.

dholladay00 commented 1 year ago

I think we'll have to see the accuracy numbers. My thinking is that we need something that can resolve kinks and phase boundaries that are not square regions in density-temperature space in order to really make spiner a viable option for accurate sesame data lookups without a rational function interpolator. I could be wrong, but that's why I think it will all be in the numbers.

I think this is a problem that can be solved with resolution. The sense that I'm getting from your comment @jhp-lanl is that the spiner eos has never been expected by ASC-IC to be sufficiently accurate for their purposes. This is news to me. If this is the case, I'd much rather implement a rational polynomial interpolant (anything that improves arithmetic intensity is going to be good for our performance metrics anyway) than the amount of complexity added with arbitrary or adaptive grids.

On a broader note, we need a discussion of how this may be improved to meet the needs of interested parties, but I am going to be pretty firm on not allowing the scope of this PR to creep to perfection. We can't let the perfect be the enemy of the good. We are currently over-resolving to a very high degree most of the EOS domain to get semi-reasonable results for the most complex (and relatively localized) regions and the memory overhead is large and can be alleviated to in large part with this capability and we should not lose sight of that. This being said, I am not the physics expert here and so if I am missing something major we need to talk about it which is why I suggest a meeting with interested parties.

Yurlungur commented 1 year ago

I think we'll have to see the accuracy numbers. My thinking is that we need something that can resolve kinks and phase boundaries that are not square regions in density-temperature space in order to really make spiner a viable option for accurate sesame data lookups without a rational function interpolator. I could be wrong, but that's why I think it will all be in the numbers.

@jhp-lanl correct me if I'm wrong but I was under the impression that many sesame tables had exactly this "rectangular region" structure, the sesame grids are also Cartesian product topology.

Although I agree threading things through singularity eos is really the demonstration of value, this should be an improvement over the current approach.

I don't currently have a way to go beyond Cartesian product topology, but I would resist building that until we know it's needed for our use cases.

Note that one could still play the games that eospac plays with Cartesian product topology, such as offsetting the cold curve in energy space, to produce grid lines that conform to the cold curve, even with structured grids like these.

chadmeyer commented 1 year ago

Let me try to make clear(er) my use case here; I think @dholladay00 is fearing something worse than I think we need. In fact, if @jonahm-LANL's suggestion that the template parameters could be maxima, that might of itself cover the problem, but I do think it needs more focused thinking. To be clear, I'm not actually suggesting nested grids in multi-D are necessary or useful, but that's what my mind went to with the name.

I'm going to focus on EOS tables for the moment. Noting that we are making a trade-off between memory and performance and accuracy, different materials (and, just as importantly, different users/use cases) will have different requirements for their material. There might be a material that has two important phase transition regions that need to be captured to get the behavior of the material correct. It might be the case that they are sufficiently separated that to grid the region between them at the finest accuracy would be more expensive in memory than preferred; thus the arrangement in one of the dimensions of gridding might be CFMFC (C=coarse, F=fine, M=medium). For another material, there might just be one region you want to highlight (CFC), or even just the temperature axis rather than the density axis. This exercise must be done per material (and different legitimate choices could be made).

My concern is that I would need to know what type of thing I'm going to generate at compile time (at least how many grids something has per dimension). Then, a HierarchicalDB<3> is not the same object as a HierarchicalDB<5> or a HierarchicalDB<1> or a DataBox for that matter. If that were a "hard" compile time paramater, the complexity of supporting it in an EOS downstream would seem to be quite high (again, I don't know for sure).

@dholladay00, I don't know if I understand your statement that it could be O(1) but not suited for even 5 regions (or more) of different resolution. Do you mean O(Ngrids)? I'm also not sure I understand what you mean about dynamic memory. Is it that much of a concern for the number of grids to be known only at runtime?

I'm not saying that it has to be perfect, but I'm struggling to see a use case outside of one-off testing.

jhp-lanl commented 1 year ago

Sorry I should clarify. My comment was not meant to be blocking. Let's get this in and test it and see where things need to go from there.

Yurlungur commented 1 year ago

I'm going to focus on EOS tables for the moment. Noting that we are making a trade-off between memory and performance and accuracy, different materials (and, just as importantly, different users/use cases) will have different requirements for their material. There might be a material that has two important phase transition regions that need to be captured to get the behavior of the material correct. It might be the case that they are sufficiently separated that to grid the region between them at the finest accuracy would be more expensive in memory than preferred; thus the arrangement in one of the dimensions of gridding might be CFMFC (C=coarse, F=fine, M=medium). For another material, there might just be one region you want to highlight (CFC), or even just the temperature axis rather than the density axis. This exercise must be done per material (and different legitimate choices could be made).

This is a fair concern. I have an additional comment below, but first I wanted to note. that it's possible to have the axes be different already in the current design

My concern is that I would need to know what type of thing I'm going to generate at compile time (at least how many grids something has per dimension). Then, a HierarchicalDB<3> is not the same object as a HierarchicalDB<5> or a HierarchicalDB<1> or a DataBox for that matter. If that were a "hard" compile time paramater, the complexity of supporting it in an EOS downstream would seem to be quite high (again, I don't know for sure).

@dholladay00, I don't know if I understand your statement that it could be O(1) but not suited for even 5 regions (or more) of different resolution. Do you mean O(Ngrids)? I'm also not sure I understand what you mean about dynamic memory. Is it that much of a concern for the number of grids to be known only at runtime?

So I just re-skimmed the diff---I think changing the number of grids to a runtime parameter is actually very simple--simple enough I could probably add it before this gets merged if you feel sufficiently strongly. The reason @dholladay00 and I went with the current design was a few reasons:

  1. Simplicity for a first implementation
  2. The maximum number of "patches" in the grid must be compile time known if we want to use normal C/C++ arrays instead of something like Kokkos views or PORTABLE_MALLOC. And static arrays are easier in this case when the total number of patches is assumed to be small.
  3. As a first pass we also went with the binary search design for finding the relevant grid to use. This is O(lgN) where N is the number of grids. If that number is small, it might as well be constant---lg(8) is 3 after all. If the number of grids becomes sufficiently large in practice, we have a backup plan. We can switch to a design where we store a 1D index map that maps position to grid index. @dholladay00 and I outo avoid going down that path until we know it's necessary. In the current design we should still have very fast lookups so long as the ratio of grids to points/grid is small.

I'm not saying that it has to be perfect, but I'm struggling to see a use case outside of one-off testing.

Why do you say that? Modulo perhaps changing the compile-time parameter to a MAX number of grids as opposed to an exact number, I think this is a fairly complete first pass at doing what you're describing.

dholladay00 commented 1 year ago

@dholladay00, I don't know if I understand your statement that it could be O(1) but not suited for even 5 regions (or more) of different resolution. Do you mean O(Ngrids)? I'm also not sure I understand what you mean about dynamic memory. Is it that much of a concern for the number of grids to be known only at runtime?

@chadmeyer when N is compile time known, then it's basically a multiplicative constant (in this case log(N) which should be <4) which still means O(1) since everything is known at compile time.

@Yurlungur do Databoxs have a get on device thing? If so, maybe runtime could be handled somewhat more easily than I was thinking. I always worry about internal heap allocations, but maybe it could be done in a reasonable way if all of that complex memory management infrastructure is baked in

chadmeyer commented 1 year ago

Modulo perhaps changing the compile-time parameter to a MAX number of grids as opposed to an exact number, I think this is a fairly complete first pass at doing what you're describing.

Mostly that. If the number of grids per-dimension-per-table is fixed, then using it in, say, spinerEOS will be problematic for the reasons given above. If it is a maximum with runtime-known size (with the only real difference being the memory overhead of the object via the size of the private members):

  RegularGrid1D<T> grids_[NGRIDS];
  int pointTotals_[NGRIDS];

then, that's probably an acceptable case. However, given that the allocation of the grids mentioned here only will happen at some problem set-up phase (and not per time-step, for instance), I question whether the template parameter approach is entirely useful. It may be that having the grids on heap rather than stack would make a difference; I just don't know.

From this perspective, the ability to specify per-databox-per-dimension kinds/numbers of grids at runtime seems like an essential feature before this could be adopted downstream for most applications I can think of. Other details or niceties are less essential.

Yurlungur commented 1 year ago

@chadmeyer @dholladay00 @jhp-lanl I have modified the PR based on your feedback and made the following changes:

I question whether the template parameter approach is entirely useful. It may be that having the grids on heap rather than stack would make a difference; I just don't know.

I left the maximum value as a template parameter---mainly because I do not want to have to manage dynamic memory on device for such a low-level feature. Using an array of some fixed maximum size is a much easier solution. If it turns out to be an impediment, we can move to dynamic memory later with no change in visible, downstream functionality.

I believe this PR is now ready for merge. Those of you interested, please re-review before I click the button.

chadmeyer commented 1 year ago

I think this covers at least a minimum usage to at least try out the capability. I don't see any blockers at this point.

Yurlungur commented 1 year ago

Thanks @chadmeyer !

@jhp-lanl do you want to look through it more or should I click the button?

jhp-lanl commented 1 year ago

Thanks @chadmeyer !

@jhp-lanl do you want to look through it more or should I click the button?

Yes. I'm responding. I think you have some errors in that above expression but I need to make sure

Yurlungur commented 1 year ago

Thanks for your catch on the constructor conditional @jhp-lanl . Were there other comments you wanted to make or should I merge after tests pass?

Yurlungur commented 1 year ago

Tests triggered on re-git

Yurlungur commented 1 year ago

Tests pass on re-git

jhp-lanl commented 1 year ago

Thanks for your catch on the constructor conditional @jhp-lanl . Were there other comments you wanted to make or should I merge after tests pass?

Merge away