Open nzqo opened 3 months ago
The definition for interpolate says:
Interpolate intermediate values. The interpolation method is linear.
This makes sense. How can you linearly interpolate when you only have a single point?
The definition for interpolate says:
Interpolate intermediate values. The interpolation method is linear.
This makes sense. How can you linearly interpolate when you only have a single point?
Well, I am technically talking about extrapolation, since this is about extension past boundaries.
Having just a single data point available is an edge case and I would suggest then it should just not do anything or raise an error/warning imo. In the example I gave it is definitely possible to linearly extrapolate though
If you have a range of nulls, interpolate
will take the values spanning that range and interpolate. When you have a range of nulls at the edge of your data, you have only a single point at one end, hence why you cannot interpolate.
Are you saying that polars should use the last two available non-null points to define the line that will be used in the extrapolation? This feels like we're in specific-scenario land at this point and a custom function of your own making would be best suited.
Are you saying that polars should use the last two available non-null points to define the line that will be used in the extrapolation?
That would be one way, yes. I'd honestly be surprised if that was such an outlandish scenario. If you consider any time series that has missing values at the end, you'd run into this issue of not being apply to fill those Nulls without leaving the native API. However, you are right in that it probably shouldn't be part of "linear interpolation".
In pandas, I would just use a spline interpolation, which actually extends past the edges of data points, or an extrapolation. The former isn't available yet, while for the latter I am not sure how/whether I would be able to implement it with the current API. Thoughts on these two options?
Description
Currently, interpolation does not extend to missing values at the "edge" of the DataFrame
yields
While I can use
fill_nulls
to fill the value at the edge, there are many scenarios in which I find myself seeking the ability to linearly interpolate there as well. The best example is probably regular timeseries data, where the timestamp should not just be repeated at the end, but rather extended.I believe this could either come in the form of a strategy in
fill_nulls
or as an option the interpolation expressions.