Redundancy in arguments

vastunghia commented 1 year ago

Why one should provide xi even in the case where x_edges is passed? Not getting this.

Tried to change xi with same x_edges and the result does not seem to change — as one would expect, at least as long as xi is compatible with x_edges (that is, when x_edges[n] <= xi[n] <= x_edges[n+1]).

I think one and only one of the two arguments should be accepted.

jararias commented 1 year ago

xi and x_edges are different things and both must be provided:

xi indicates the location of the source points to be interpolated. It is paired with yi, which are the values of such points
x_edges defines intervals around each xi.

The interpolation is constrained such that the average of the interpolated curve between such edges is always equal to the corresponding yi value (that is the meaning of mean-preserving). Normally, the x_edges will be symmetrical around the xi positions. However, this is not necessarily so. By default, the MeanPreservingInterpolation objects assume such symmetrical distribution.

I hope this helps you to clarify the differences.

vastunghia commented 1 year ago

Thanks a lot for this. Unfortunately, it does not solve my doubts.

My high-level understanding is that the code will return a spline function f such that mean[f(x)] = some pre-defined values for x in some pre-defined intervals. As such, the problem looks fully specified to me as soon as the N intervals and the N target mean values are defined.

Now I understand that, if you pass an N-dimensional xi only, and not x_edges, then the code will assume that there are N intervals that are in the form [(xi[n-1] + xi[n])/2, (xi[n] + xi[n+1])/2]. This, together with the N-dimensional yi, makes the problem fully specified.

What I don't understand is why one should not be able to pass x_edges only, as this variable defines the intervals explicitly (without having to build them starting from xi) and, once again together with the N-dimensional yi, makes the problem fully specified.

Actually, not only I do not understand why one should pass xi as well -- I just do not understand what the code is doing with the xi argument once it already knows how intervals are shaped thanks to x_edges.

The fact that passing a different xi argument, while keeping the same x_edges, yields the same spline function, seems to corroborate my position, i.e. that xi is useless when x_edges is passed. But I must be missing something.

jararias commented 1 year ago

I see your point now. Normally, one knows both xi and yi, and furthermore, xi is at the center of the integration intervals. That is what I had in mind and that is why the entry to mpsplines is as it is, aiming at simplifying the user interaction.

It is true that, once that x_edges is defined, xi is not relevant. However, in principle, it must be provided because it is at the core definition of the splines. Have a look at Eq (1) in the paper. If is is not provided (because only x_edges is passed), then it must be "reconstructed" internally. Perhaps, the code could be tweaked such that it works only with x_edges, but this is something that is not needed (although I didn't check it) if we do the reconstruction prior calling the actual interpolator.

Nonetheless, I opened a pull request (https://github.com/jararias/mpsplines/pull/2) with the changes that you suggested. The argument xi is made optional, as it is also x_edges. I also simplified some datetime treatements, and some bits of documentation.

I would like to know more about your intended application, to see if these changes solve the issue.

vastunghia commented 1 year ago

Yes I see that Eq (1) assumes a form of the piecewise quadratic function that explicitly depends upon locations passed in xi.

However, I believe that constraints set out in Eq (2) actually override that explicit dependance.

What I mean is that I expect that, caeteris paribus, if you change xi locations inside the x_edges intervals, constraints will make resulting ai, bi and ci coefficients adjust so that at the end of the day one will come up with exactly the same arc of parabola for each x_edges interval.

Of course this is just an educated guess at this point. If true, this could be easily proven using mathematical tools. And, if false, it could be disproven easily as well.

Also, if true (as I believe, since I already verified that in a couple of cases trying to change xi and observing the very same resulting spline), that would mean that there is no problem (no extra degree of freedom) in the spline building algorithm, so the only issue is the inconvenience of having to pass an argument (xi) that actually has no effect at all on the output result. And this is, of course, not a real problem.

I believe that the formulation you gave in the paper for the problem of mean-preserving splines is not the most general one, and is somewhat biased towards a specific use of that algorithm that you had in mind. Which (again) is not a problem of course, as long as you use it for that specific task.

IMO, the most general formulation should start from a user-provided set of intervals x_edges. Of course then one could also add a more specific use case where points xi are passed (instead of x_edges), having the algorithm guess the x_edges starting from the xi according to some pre-defined rule. But that should be a more specific case.

Btw I'm using your Python code (thanks!) for the task of bootstrapping future Commodity spot prices starting from today's observed market Forward prices. Forward prices are indeed to be interpreted as market expectation of future spot price averages on their respective future delivery period. Future delivery periods can include future months, quarters or years.

jararias commented 1 year ago

Good point. I might review this topic someday. Thanks!

jararias / mpsplines

Redundancy in arguments #1