BrendanKKrueger commented 2 months ago

PR Summary

Proposing changes to Spiner based on a wrapper that was written in Singe, in order to provide:

~extrapolations beyond the edge of the data table;~
transformations (e.g., log-log interpolation)

PR Checklist

[ ] Adapt example code from Singe to fit within Spiner
[ ] The Singe example hard-codes the transformation to be log-lin with both linear and logarithmic accessors for both axes; make this general and provide linear, logarithmic, NQT-log as example transformations.
[ ] Because Singe doesn't own Databox, this is a wrapper around Databox. Determine if these features should continue to be an extended wrapper around Databox or features of Databox itself.
[ ] Bring tests over from Singe or write new tests.
[ ] Code is formatted. (You can use the format_spiner make target.)

BrendanKKrueger commented 2 months ago

@dholladay00 and @Yurlungur, these are some features from Singe that Jonah and I discussed transferring over to Spiner to make them more widely accessible. My main question at the moment is whether you want these features to (a) be added directly to Databox, (b) be added to a wrapper around or extension to Databox, or (c) kept in Singe.

Yurlungur commented 2 months ago

Broadly I think I'm supportive of these features moving into spiner... but looking at the header file you added for this MR I wonder how much work it would be because I notice that your version pretty explicitly talks about, e.g., density and temperature, which I don't think we want to do with spiner. It should be more generic.

BrendanKKrueger commented 2 months ago

Oh, absolutely it should be (and will be) more generic. The first commit was just copying over the Singe file to show a starting point, but there's plenty in that file that's specific to Singe and that will need to be cleaned up to be more appropriate for Spiner.

BrendanKKrueger commented 1 month ago

A few implementation question that came up as I started adapting RegularGrid1D to have transformations:

Do we ever expect a transformation to have runtime state? The current implementation requires that transformation state (if there is any) be available at compile time, because it simplifies the interface.
- Allowing runtime state would also require that either (a) the transformation have a default constructor, or (b) users would be forced to update all their code to provide a transformation (even when there is a default constructor).
- ~~The current implementation assumes no runtime state, which means in the future there would be bigger changes if a transformation ever needed runtime state.~~
- Edit 2024-10-23: We decided that we don't want to allow transformations to have runtime state.
Do we consider x (the untransformed variable) or u (the transformed variable) to be the "ground truth"? Transformations may not always be perfectly symmetric, which creates the possibility of small gaps appearing. And because those gaps are at critical points (the endpoints of the domain), you could hit them more often than you might think.
- Edit 2024-10-23: See discussions below on handling this.
~~Should umin, umax, and du be accessible by the user?~~
- Edit 2024-10-23: I made (nearly?) everything about u hidden from the user in RegularGrid1D, so that it's treated as an internal detail that the user shouldn't be expected to know about.
The quantity dx is now poorly defined, because now du is fixed and dx is derived from the transformation, and it may vary across the domain. Do we need the dx function? The best we can do is probably to take an index or an x value and return the dx value for that specific interval. It probably makes more sense to remove it entirely, but that changes the interface that's available to the user.

BrendanKKrueger commented 1 month ago

Another discussion we need to decide on: The existence and behavior of operator() is a problem.

If you use operator() to assign to the dependent variable, you think you're assigning to the y (untransformed) value, because that was the previous behavior. But in reality, you're assigning to the v (transformed) value, which means that you're not storing the right value unless you know this quirk. But that means that we're no longer hiding the transformations, but exposing things that users won't necessarily know about. It's also surprising (I wrote the code and made this mistake already myself).
- We could try to be clever by writing a value_reference_t class that's returned by operator() and has customized operators to hide the transformations. This would be a non-trivial amount of work, and I can imagine a number of ways that this would break.
- Instead, I've opted for a "safer" option: write new methods get_data_value and set_data_value (I'm not attached to these names), which are classic accessors to the dependent variable values, which allow the DataBox to handle the transformations correctly.
The next question is how to handle existing code, which already uses operator().
- If we add the accessors but leave operator(), we still have the surprising behavior. Users will naturally tend to use the feature they're used to (operator()) and will run into trouble without understanding why.
- If we remove operator() to avoid causing problems, we break existing code. As much as possible, I'd like to hide changes so that existing code continues to work. It's not always possible, but it's preferable when possible.
- I've opted for a compromise solution:
  - Leave operator() in place, but add some SFINAE so that it only works if you use the default transformation (TransformLinear, where the transform is a no-op). This will preserve existing code.
  - Existing code also has the option to use the new accessors.
  - If you add a transformation, operator() is removed and you must use the new accessors. In this situation, you're writing new code using new features, so the "keep existing code working as-is" argument doesn't apply.
  - The trickiest situation is a user adapting existing code by adding a transformation. We'll have to clearly communicate the behavior here so that they understand what's going on (documentation, release notes, etc.).
  - In the future, operator() could be removed from a later release of Spiner, although this is not required. Personal opinion: it should (eventually) be removed. In the case of an interpolator, operator() would most naturally be assumed to be the method that does the interpolation (I ran into this problem when I first used Spiner), so I think operator() being a way to access the data points isn't the most intuitive way to name things.

BrendanKKrueger commented 1 month ago

If someone can check saveHDF and loadHDF, that would be helpful -- I made very simple, naive changes.

BrendanKKrueger commented 1 month ago

Related to the operator() / accessor discussion: Should be set_data_value (or whatever we name it) be const? I made it const because there's already operator() const that returns mutable reference. This implies that when a DataBox is const, the metadata of the DataBox is const, but the dependent variable data is not constant.

Yurlungur commented 1 month ago

Instead, I've opted for a "safer" option: write new methods get_data_value and set_data_value (I'm not attached to these names), which are classic accessors to the dependent variable values, which allow the DataBox to handle the transformations correctly.

This makes complete sense to me. I'm in favor.

Regarding `operator()`

I feel pretty strongly that operator needs to stay as databox sometimes plays the role of a multiD array, not an interpolator. And that's what some of the internal metadata. That said, I think I'm okay with your compromise solution, so long as we have a public accessor for the underlying pointer, which I think we do.

Yurlungur commented 1 month ago

Related to the operator() / accessor discussion: Should be set_data_value (or whatever we name it) be const? I made it const because there's already operator() const that returns mutable reference. This implies that when a DataBox is const, the metadata of the DataBox is const, but the dependent variable data is not constant.

I think that's right

BrendanKKrueger commented 1 month ago

The head of main isn't formatted correctly. I ran the clang-format, but now I'm rolling back changes to lines I didn't mess with. I'll leave it to y'all if you want to run clang-format completely or not.

Yurlungur commented 1 month ago

The head of main isn't formatted correctly. I ran the clang-format, but now I'm rolling back changes to lines I didn't mess with. I'll leave it to y'all if you want to run clang-format completely or not.

Let's just submit a separate MR to main that formats the code in one sweep.

BrendanKKrueger commented 4 weeks ago

should I add the NQT logs and friends now in this PR? Or save them for a later implementation?

Whatever makes you happy. I just didn't feel like getting into adding the NQT stuff as a dependency, because dependency management is often a tricky question and usually best left to a core team member. I'm fine if you add commits to this MR or if you add another MR that depends on this one.

Yurlungur commented 2 weeks ago

should I add the NQT logs and friends now in this PR? Or save them for a later implementation?

Whatever makes you happy. I just didn't feel like getting into adding the NQT stuff as a dependency, because dependency management is often a tricky question and usually best left to a core team member. I'm fine if you add commits to this MR or if you add another MR that depends on this one.

I'll add them in a later MR... Very busy at the moment so that will keep things moving along.

BrendanKKrueger commented 1 week ago

Earlier I stated

Do we consider x (the untransformed variable) or u (the transformed variable) to be the "ground truth"? Transformations may not always be perfectly symmetric, which creates the possibility of small gaps appearing. And because those gaps are at critical points (the endpoints of the domain), you could hit them more often than you might think.

Thinking about this more, I think I should clarify my thinking and get your thoughts. For additional clarity, let's define some notation:

x_in is the value of x that the user inputs into a given function
u_in is the value of x_in after transformation: u_in = Transform::forward(x_in)
x_tr is the value of x_in after a round-trip transformation: x_tr = Transform::reverse(u_in)
Analytically, x_in and x_tr are equal, but with finite-precision numerics they may not be equal.

Option 1: x is the "ground truth"

The bounds on u should be shifted slightly away from u_in so that when the user sets x_in to the lower bound or upper bound that the user passed into the constructor, we are guaranteed that those values will fall within the bounds of u_in regardless of how theoretically accurate u_in is and whether or not x_tr does something weird.
In this case, the Databox should save the x_in values for the lower and upper bounds, and the methods that query the bounds should return these saved values even if x_tr for the lower and upper bounds is different.

Option 2: u is the "ground truth"

The bounds on u are exactly the bounding values of u_in. This means that the bounds on x are the x_tr bounds rather than the x_in bounds, which may slightly shift the range of x. That means that the bounds on x that the user passed to the constructor as x_in values may or may not actually lie within the range of the Databox.
In this case we're already storing the u values, and calling the methods that query the bounds will just call Transform::reverse on the bounding u values.

The current implementation is option 2 (u is the "ground truth"), but thinking about it more, I think it should be option 1 (x is the "ground truth"). If you agree with that, then I'll have to put in some logic to shift the bounds in u to ensure that the bounds in x_in are within the bounds of x_tr. It's the end of the day, so I may try to address that tomorrow.

BrendanKKrueger commented 1 week ago

I thought about this more last night, and realized that my comment from the end of the day yesterday was incorrect. I now think the following is correct:

Note: f(x) = forward transform; r(u) = reverse transform.
Since ulo is derived from xlo, then f(xlo) = ulo and f(anything less than xlo) will be outside the u bounds. Similarly on the high end. So if x is the ground truth, we don't need to do any corrections to the transformations.
If u is the ground truth, then we do need to muck about with things.
- f(r(ulo)) is not guaranteed to be equal to ulo. If you call the method to query the lower bound, that will return r(ulo). If you then interpolate to that point, you would be interpolating to the point u = f(r(ulo)), which may or may not be within the u bounds.
- r(f(xlo)) is not guaranteed to be equal to xlo. If you call the constructor with xlo then ulo will be set as f(xlo). If you then call the method to query the lower bound, you get r(f(xlo)) back, which may or may not match xlo.
- There are two critical points (the lower and upper bounds), and two equations to resolve (f(r(u)) = u, r(f(x)) = x), so you would apply a linear offset to the forward and reverse transformations in order to get everything to work. But that could mess up any careful calibration of the transform itself (for example, the stuff I did with the logarithmic transform to ensure that r(f(0)) = 0).

I think that tells us:

We need to treat x as the "ground truth" representation, which makes sense since the user works in x-space and u-space should be treated like an internal detail not inherently visible to the user.
I need to check the code and ensure that the xlo and xhi values from the constructor are saved and used for the lower-bound and upper-bound query methods.

PS: The GitHub interface is driving me nuts, because I can't seem to make conversation threads, which means that every conversation gets interleaved with every other conversation.

BrendanKKrueger commented 1 week ago

@Yurlungur, I updated how the bounds are handled, and added some additional testing. I found that we end up with an issue in RegularGrid1D::x(const int i), because that's derived from u so you have to make some sort of decision. I added a "TODO" comment with a few possible ways to handle this and very brief comments on each. I'd be interested in your thoughts on this.

lanl / spiner

Port potential Databox enhancements into Spiner from Singe -- transformations #95

PR Summary

PR Checklist

Regarding `operator()`

lanl / spiner

Port potential Databox enhancements into Spiner from Singe -- transformations #95

PR Summary

PR Checklist

Regarding operator()

Regarding `operator()`