Closed vastunghia closed 1 year ago
xi
and x_edges
are different things and both must be provided:
xi
indicates the location of the source points to be interpolated. It is paired with yi
, which are the values of such pointsx_edges
defines intervals around each xi
.The interpolation is constrained such that the average of the interpolated curve between such edges is always equal to the corresponding yi
value (that is the meaning of mean-preserving). Normally, the x_edges
will be symmetrical around the xi
positions. However, this is not necessarily so. By default, the MeanPreservingInterpolation objects assume such symmetrical distribution.
I hope this helps you to clarify the differences.
Thanks a lot for this. Unfortunately, it does not solve my doubts.
My high-level understanding is that the code will return a spline function f such that mean[f(x)] = some pre-defined values for x in some pre-defined intervals. As such, the problem looks fully specified to me as soon as the N intervals and the N target mean values are defined.
Now I understand that, if you pass an N-dimensional xi
only, and not x_edges
, then the code will assume that there are N intervals that are in the form [(xi[n-1] + xi[n])/2, (xi[n] + xi[n+1])/2]
. This, together with the N-dimensional yi
, makes the problem fully specified.
What I don't understand is why one should not be able to pass x_edges
only, as this variable defines the intervals explicitly (without having to build them starting from xi
) and, once again together with the N-dimensional yi
, makes the problem fully specified.
Actually, not only I do not understand why one should pass xi
as well -- I just do not understand what the code is doing with the xi
argument once it already knows how intervals are shaped thanks to x_edges
.
The fact that passing a different xi
argument, while keeping the same x_edges
, yields the same spline function, seems to corroborate my position, i.e. that xi
is useless when x_edges
is passed. But I must be missing something.
I see your point now. Normally, one knows both xi
and yi
, and furthermore, xi
is at the center of the integration intervals. That is what I had in mind and that is why the entry to mpsplines
is as it is, aiming at simplifying the user interaction.
It is true that, once that x_edges
is defined, xi
is not relevant. However, in principle, it must be provided because it is at the core definition of the splines. Have a look at Eq (1) in the paper. If is is not provided (because only x_edges
is passed), then it must be "reconstructed" internally. Perhaps, the code could be tweaked such that it works only with x_edges
, but this is something that is not needed (although I didn't check it) if we do the reconstruction prior calling the actual interpolator.
Nonetheless, I opened a pull request (https://github.com/jararias/mpsplines/pull/2) with the changes that you suggested. The argument xi
is made optional, as it is also x_edges
. I also simplified some datetime treatements, and some bits of documentation.
I would like to know more about your intended application, to see if these changes solve the issue.
Yes I see that Eq (1) assumes a form of the piecewise quadratic function that explicitly depends upon locations passed in xi
.
However, I believe that constraints set out in Eq (2) actually override that explicit dependance.
What I mean is that I expect that, caeteris paribus, if you change xi
locations inside the x_edges
intervals, constraints will make resulting ai, bi and ci coefficients adjust so that at the end of the day one will come up with exactly the same arc of parabola for each x_edges
interval.
Of course this is just an educated guess at this point. If true, this could be easily proven using mathematical tools. And, if false, it could be disproven easily as well.
Also, if true (as I believe, since I already verified that in a couple of cases trying to change xi
and observing the very same resulting spline), that would mean that there is no problem (no extra degree of freedom) in the spline building algorithm, so the only issue is the inconvenience of having to pass an argument (xi
) that actually has no effect at all on the output result. And this is, of course, not a real problem.
I believe that the formulation you gave in the paper for the problem of mean-preserving splines is not the most general one, and is somewhat biased towards a specific use of that algorithm that you had in mind. Which (again) is not a problem of course, as long as you use it for that specific task.
IMO, the most general formulation should start from a user-provided set of intervals x_edges
. Of course then one could also add a more specific use case where points xi
are passed (instead of x_edges
), having the algorithm guess the x_edges
starting from the xi
according to some pre-defined rule. But that should be a more specific case.
Btw I'm using your Python code (thanks!) for the task of bootstrapping future Commodity spot prices starting from today's observed market Forward prices. Forward prices are indeed to be interpreted as market expectation of future spot price averages on their respective future delivery period. Future delivery periods can include future months, quarters or years.
Good point. I might review this topic someday. Thanks!
Why one should provide
xi
even in the case wherex_edges
is passed? Not getting this.Tried to change
xi
with samex_edges
and the result does not seem to change — as one would expect, at least as long asxi
is compatible withx_edges
(that is, whenx_edges[n] <= xi[n] <= x_edges[n+1]
).I think one and only one of the two arguments should be accepted.