Open zinji opened 5 years ago
@zinji it currently does not but that's an interesting idea, whats the use case you had in mind for this functionality?
@svenklemm this functionality is highly useful when computing stats on the timeseries data such as mean daily or monthly temperature; dality rainfall totals etc.
usecase: From hourly rainfall totals, i want to compute daily rainfall totals to 9:00 am using gap filling to fill any gaps less that 4hrs and any gaps greater than 4hrs are left null.
I concur, we have a very similar use case at hand that would benefit very much from this. We have sensor readings that usually come in every 15 minutes. So we are ok with interpolating 1-2 missing values, but don't want to go further than that. Anything more would be treated as a gap in the data, i.e. NULL
I think I managed to achieve this through the following query:
`CREATE TABLE timestamps (
ts timestamp,
value numeric
);
INSERT INTO timestamps VALUES ('2015-05-01 12:00:00.0',1), ('2015-05-01 12:15:00.0',2), ('2015-05-01 12:30:00.0',3), ('2015-05-01 12:45:00.0',4), ('2015-05-01 13:00:00.0',5), ('2015-05-01 13:15:00.0',6), ('2015-05-01 13:30:00.0',7), ('2015-05-01 13:45:00.0',8), ('2015-05-01 14:00:00.0',9), ('2015-05-01 14:15:00.0',10), ('2015-05-01 15:00:00.0',7), ('2015-05-01 15:15:00.0',8), ('2015-05-01 15:30:00.0',9), ('2015-05-01 15:45:00.0',10);
WITH interpolated AS ( WITH deltas AS ( SELECT ts, value, extract(epoch from ts::timestamp) - lag(extract(epoch from ts), -1) OVER (ORDER BY ts) delta FROM raw_measurements ORDER BY ts ) SELECT time_bucket_gapfill( '5 minutes', ts, start => '2015-05-01 11:53:00', finish => '2015-05-01 17:00:00') AS minutes_5, interpolate(avg(value)) AS avg_val, locf(last(delta,ts)) AS locf_delta, last(delta,ts) AS deltaActual FROM deltas GROUP BY minutes_5 ) SELECT * from interpolated WHERE interpolated.locf_delta >= -%s OR NOT deltaActual IS NULL;` Not the most elegant way and I'm sure theres lots of room for improvement, but the general idea is to use interpolate over the whole dataset and generate a new column with the time to the next measurement which can then be used to drop all interpolations that were over a gap more than x seconds (900 in this example). This will not fill the gap with NULL values, but you could probably achieve that by running another time_bucket_gapfill over the current result.
@zinji ok so you would want gapfill to still produce the row with a timestamp but the locf/interpolate column would have NULL values
@svenklemm Yes, the query output should output the applicable timesteps with null values depending on the time bucket used.
Any update on this? This would be a really useful feature to have.
I'm looking for this as well 🥺
Does the gap filling functions have option to limit the size of gaps being filled by setting maximum length of gap in number of time steps?