Open shaunsales opened 4 years ago
Hi @shaunsales It sounds like you want the "prev" parameter in locf to "look back" out of the range:
https://docs.timescale.com/latest/api#locf
But from your second argument, it sounds like you are doing that? Do you see any different issues if you add the explicit start
and finish
argument in the gapfill
function, and eliminate them from your WHERE clause?
Thanks @mfreed that actually got us most of the way there - oddly the TSDB docs recommend against this and suggest to only use the WHERE
clause, and omit the gapfill params.
One other thing - when we use the method you suggested, we have an additional entry at the beginning of the data set that is the last entry prior to the specified range.
For example, if we ask for hourly samples from 2020-01-02
to 2020-01-03
we get 25 entries rather than 24. The first entry is the latest entry outside the range - i.e. 2020-01-01 01:00:00
and the remaining 24 entries are correct. Any idea what might be causing this?
We don't typically recommend this because it's actually less efficient, as it can't do chunk exclusion at planning time...because you don't known a priori how "far back" you need to search to find the value preceding to the LOCF. So you can't exclude chunks "before" the WHERE clause during planning...because you might need to "look back" into these chunks.
So if you don't need to use the "look back", it's not recommended. But you are right that we should clarify the docs here.
Regarding your new issue, can you repost the specific query & output you are getting?
Hi, I have the same issue, the following query return nothing.
SELECT id_time_series,
Time_bucket_gapfill('1 hour', MO.occurred_on) AS time_bucket,
Locf(First(MO.value, MO.occurred_on), (SELECT value
FROM measurement_observation
WHERE id_time_series = 5911 AND
occurred_on < '2020-10-29'
ORDER BY occurred_on DESC
LIMIT 1)) AS time_bucket_first
FROM measurement_observation MO
WHERE occurred_on >= '2020-10-29' AND
occurred_on <= '2020-10-30' AND
id_time_series = 5911
GROUP BY time_bucket, id_time_series
ORDER BY time_bucket
But if i remove id_time_series from the SELECT and from the GROUP BY, the query return rows normally with the expected behaviour from locf second parameter
This query work as expected :
SELECT Time_bucket_gapfill('1 hour', MO.occurred_on) AS time_bucket,
Locf(First(MO.value, MO.occurred_on), (SELECT value
FROM measurement_observation
WHERE id_time_series = 5911 AND
occurred_on < '2020-10-29'
ORDER BY occurred_on DESC
LIMIT 1)) AS time_bucket_first
FROM measurement_observation MO
WHERE occurred_on >= '2020-10-29' AND
occurred_on <= '2020-10-30' AND
id_time_series = 5911
GROUP BY time_bucket
ORDER BY time_bucket
Any help would be much appreciated!
Hi, I have the same issue, the following query return nothing.
SELECT id_time_series, Time_bucket_gapfill('1 hour', MO.occurred_on) AS time_bucket, Locf(First(MO.value, MO.occurred_on), (SELECT value FROM measurement_observation WHERE id_time_series = 5911 AND occurred_on < '2020-10-29' ORDER BY occurred_on DESC LIMIT 1)) AS time_bucket_first FROM measurement_observation MO WHERE occurred_on >= '2020-10-29' AND occurred_on <= '2020-10-30' AND id_time_series = 5911 GROUP BY time_bucket, id_time_series ORDER BY time_bucket
But if i remove id_time_series from the SELECT and from the GROUP BY, the query return rows normally with the expected behaviour from locf second parameter
This query work as expected :
SELECT Time_bucket_gapfill('1 hour', MO.occurred_on) AS time_bucket, Locf(First(MO.value, MO.occurred_on), (SELECT value FROM measurement_observation WHERE id_time_series = 5911 AND occurred_on < '2020-10-29' ORDER BY occurred_on DESC LIMIT 1)) AS time_bucket_first FROM measurement_observation MO WHERE occurred_on >= '2020-10-29' AND occurred_on <= '2020-10-30' AND id_time_series = 5911 GROUP BY time_bucket ORDER BY time_bucket
Any help would be much appreciated!
Is there any traction on this issue? It's about a year now since this comment was posted. I'm experience the same issue with version 2.4.2 of TimescaleDB.
Hi don't know if there's been more updates on this - I will just add that it would be worthwhile for time_bucket_gapfill to do its generation of time buckets without ANY data passed to it (other than a desired time range?), otherwise (as far as I can tell?) you have to use Postgres generate_series logic instead, which doesn't have the same "rounding" logic as time_bucket(_gapfill) has. For example, if I want 5-minute buckets, time_bucket (and _gapfill) will round to the hour, but generate_series does not do that automatically (at least without extra work).
Hi,
We are big Timescale DB fans and have just migrated away from Redis Time Series to TSDB. However for some of our financial time series data, we're facing some issues displaying stocks that have zero daily trade volume.
For example, we want to pull today's prices for a stock with hourly time buckets. However that stock hasn't had any trades today, so we want to get the last price using
locf
andtime_bucket_gapfill
all buckets from start to end timestamp - basically just giving us a flat line of identical prices based on the previous close.The problem we are having, is when there are no trades,
time_bucket_gapfill
andlocf
don't seem to work, and no time series data is returned. However if there is just one trade during the range, it works fine. Maybe we are missing something, but this has us scratching our heads. We're hoping for an elegant TSDB native solution rather than having to hack it in our code.Below is the query we're using:
Any help would be much appreciated!