Open BrendanKKrueger opened 2 months ago
@dholladay00 and @Yurlungur, these are some features from Singe that Jonah and I discussed transferring over to Spiner to make them more widely accessible. My main question at the moment is whether you want these features to (a) be added directly to Databox, (b) be added to a wrapper around or extension to Databox, or (c) kept in Singe.
Broadly I think I'm supportive of these features moving into spiner... but looking at the header file you added for this MR I wonder how much work it would be because I notice that your version pretty explicitly talks about, e.g., density and temperature, which I don't think we want to do with spiner. It should be more generic.
Oh, absolutely it should be (and will be) more generic. The first commit was just copying over the Singe file to show a starting point, but there's plenty in that file that's specific to Singe and that will need to be cleaned up to be more appropriate for Spiner.
A few implementation question that came up as I started adapting RegularGrid1D
to have transformations:
u
hidden from the user in RegularGrid1D
, so that it's treated as an internal detail that the user shouldn't be expected to know about.Another discussion we need to decide on: The existence and behavior of operator()
is a problem.
operator()
to assign to the dependent variable, you think you're assigning to the y (untransformed) value, because that was the previous behavior. But in reality, you're assigning to the v (transformed) value, which means that you're not storing the right value unless you know this quirk. But that means that we're no longer hiding the transformations, but exposing things that users won't necessarily know about. It's also surprising (I wrote the code and made this mistake already myself).
value_reference_t
class that's returned by operator()
and has customized operators to hide the transformations. This would be a non-trivial amount of work, and I can imagine a number of ways that this would break.get_data_value
and set_data_value
(I'm not attached to these names), which are classic accessors to the dependent variable values, which allow the DataBox
to handle the transformations correctly.operator()
.
operator()
, we still have the surprising behavior. Users will naturally tend to use the feature they're used to (operator()
) and will run into trouble without understanding why.operator()
to avoid causing problems, we break existing code. As much as possible, I'd like to hide changes so that existing code continues to work. It's not always possible, but it's preferable when possible.operator()
in place, but add some SFINAE so that it only works if you use the default transformation (TransformLinear
, where the transform is a no-op). This will preserve existing code.operator()
is removed and you must use the new accessors. In this situation, you're writing new code using new features, so the "keep existing code working as-is" argument doesn't apply.operator()
could be removed from a later release of Spiner, although this is not required. Personal opinion: it should (eventually) be removed. In the case of an interpolator, operator()
would most naturally be assumed to be the method that does the interpolation (I ran into this problem when I first used Spiner), so I think operator()
being a way to access the data points isn't the most intuitive way to name things.If someone can check saveHDF
and loadHDF
, that would be helpful -- I made very simple, naive changes.
Related to the operator()
/ accessor discussion: Should be set_data_value
(or whatever we name it) be const
? I made it const
because there's already operator() const
that returns mutable reference. This implies that when a DataBox
is const
, the metadata of the DataBox
is const
, but the dependent variable data is not constant.
Instead, I've opted for a "safer" option: write new methods get_data_value and set_data_value (I'm not attached to these names), which are classic accessors to the dependent variable values, which allow the DataBox to handle the transformations correctly.
This makes complete sense to me. I'm in favor.
operator()
I feel pretty strongly that operator needs to stay as databox sometimes plays the role of a multiD array, not an interpolator. And that's what some of the internal metadata. That said, I think I'm okay with your compromise solution, so long as we have a public accessor for the underlying pointer, which I think we do.
Related to the
operator()
/ accessor discussion: Should beset_data_value
(or whatever we name it) beconst
? I made itconst
because there's alreadyoperator() const
that returns mutable reference. This implies that when aDataBox
isconst
, the metadata of theDataBox
isconst
, but the dependent variable data is not constant.
I think that's right
The head of main
isn't formatted correctly. I ran the clang-format
, but now I'm rolling back changes to lines I didn't mess with. I'll leave it to y'all if you want to run clang-format
completely or not.
The head of
main
isn't formatted correctly. I ran theclang-format
, but now I'm rolling back changes to lines I didn't mess with. I'll leave it to y'all if you want to runclang-format
completely or not.
Let's just submit a separate MR to main that formats the code in one sweep.
should I add the NQT logs and friends now in this PR? Or save them for a later implementation?
Whatever makes you happy. I just didn't feel like getting into adding the NQT stuff as a dependency, because dependency management is often a tricky question and usually best left to a core team member. I'm fine if you add commits to this MR or if you add another MR that depends on this one.
should I add the NQT logs and friends now in this PR? Or save them for a later implementation?
Whatever makes you happy. I just didn't feel like getting into adding the NQT stuff as a dependency, because dependency management is often a tricky question and usually best left to a core team member. I'm fine if you add commits to this MR or if you add another MR that depends on this one.
I'll add them in a later MR... Very busy at the moment so that will keep things moving along.
Earlier I stated
Do we consider x (the untransformed variable) or u (the transformed variable) to be the "ground truth"? Transformations may not always be perfectly symmetric, which creates the possibility of small gaps appearing. And because those gaps are at critical points (the endpoints of the domain), you could hit them more often than you might think.
Thinking about this more, I think I should clarify my thinking and get your thoughts. For additional clarity, let's define some notation:
x_in
is the value of x that the user inputs into a given functionu_in
is the value of x_in
after transformation: u_in = Transform::forward(x_in)
x_tr
is the value of x_in
after a round-trip transformation: x_tr = Transform::reverse(u_in)
x_in
and x_tr
are equal, but with finite-precision numerics they may not be equal.Option 1: x is the "ground truth"
u_in
so that when the user sets x_in
to the lower bound or upper bound that the user passed into the constructor, we are guaranteed that those values will fall within the bounds of u_in
regardless of how theoretically accurate u_in
is and whether or not x_tr
does something weird.Databox
should save the x_in
values for the lower and upper bounds, and the methods that query the bounds should return these saved values even if x_tr
for the lower and upper bounds is different.Option 2: u is the "ground truth"
u_in
. This means that the bounds on x are the x_tr
bounds rather than the x_in
bounds, which may slightly shift the range of x. That means that the bounds on x that the user passed to the constructor as x_in
values may or may not actually lie within the range of the Databox
.Transform::reverse
on the bounding u values.The current implementation is option 2 (u is the "ground truth"), but thinking about it more, I think it should be option 1 (x is the "ground truth"). If you agree with that, then I'll have to put in some logic to shift the bounds in u to ensure that the bounds in x_in
are within the bounds of x_tr
. It's the end of the day, so I may try to address that tomorrow.
I thought about this more last night, and realized that my comment from the end of the day yesterday was incorrect. I now think the following is correct:
ulo
is derived from xlo
, then f(xlo) = ulo and f(anything less than xlo) will be outside the u bounds. Similarly on the high end. So if x is the ground truth, we don't need to do any corrections to the transformations.I think that tells us:
PS: The GitHub interface is driving me nuts, because I can't seem to make conversation threads, which means that every conversation gets interleaved with every other conversation.
@Yurlungur, I updated how the bounds are handled, and added some additional testing. I found that we end up with an issue in RegularGrid1D::x(const int i)
, because that's derived from u
so you have to make some sort of decision. I added a "TODO" comment with a few possible ways to handle this and very brief comments on each. I'd be interested in your thoughts on this.
PR Summary
Proposing changes to Spiner based on a wrapper that was written in Singe, in order to provide:
PR Checklist